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The Examiner further states that S.N. 08/202,236 and S.N. 08/177,920 (parent application of the 
instant application S.N. 08/475,822) are divisional applications from the parent application 
S.N. 07/158,652. However, the Examiner states that "there is no evidence that the PTO has set 
forth a restriction requirement between the nucleic acids of application Serial No. 08/202,239 and 
the methods of use of those nucleic acids as probes in the parent application Serial 
No. 08/177,920. 

Since this rejection is provisional, Applicants respectMy request that the Examiner hold 
in abeyance the mstant rejection. Upon the indication of allowable subject matter in the instant 
application, Applicants reserve the right to file a terminal disclaimer or traverse the rejection. 

The specification is objected to and claims 1 1-18 are rejected under 35 U.S.C. § 1 12, first 
paragraph, as the specification allegedly fails to adequately teach how to make and/or use the 
invention, i.e., fails to provide an enabling disclosure. 

The Examiner states that the specification does not teach how to use the invention for the 
claimed diagnostic methods, which include the nucleic acids of ORF-Q, ORF-R, ORF-1, ORF-2, 
ORF-3, ORF-4, and ORF-5, as claimed herem. Allegedly, the nucleic acid hybridization with 
HIV-1 to assay HIV-1 is allegedly not demonstrated. More particularly, the Examiner states that 
the conditions and methods are not given to distinguish HIV-1 fi-om other retroviruses. The 
Examiner cites Hahn et al. as demonstrating that at the time of filing the instant invention, it was 
known that cross-hybridization occurs between the sequences of HIV and members of the HTLV 
family. The Examiner concludes that in view of the specification's alleged lack of sufficient 
[cachings of specific hybridization using the claimed probes and Hahn et al., which shows cross- 



LAW OFFICES 

Finn EC AN, Henderson, 
Farabonv, Garrett 
S Dunner.l.l.r 

f300 I STREET, N. W. 
WASHINGTON, DC 20OO5 
202-4-08-4000 



Serial No.: 08/475,822 

hybridization to members of the HTLV family, the specification is non-enabling for the claims. 
Applicants respectfully traverse the rejection. 

The PTO has the burden of establishing a prima facie case of lack of enablement. 
Furthermore, applicants* specification disclosing how to make and use the claimed invention must 
be taken as in compliance vwth § 112, first paragraph, unless there is a reason to doubt the 
objective truth of the disclosure. In re Brana, 51 F.3d 1560, 1566, 34 U.S.P.Q.2d 1437, 1442 
(Fed Cir. 1995); In re Marzocchi . 169 U.S.P.Q. 367, 369 (C.C.P.A. 1971). 

Applicants respectfiiUy submit that the specification provides the necessary guidance to 
teach one of skill in the art how to use the claimed invention. More particularly, it is clear from 
the specification and the skill in the art that one would appreciate that the nucleic acid probes of 
ORF-Q, ORF-R, ORF-1, ORF-2, ORF-3, ORF-4, and ORF-5, are indeed capable of detecting 
the presence or absence of HIV- 1. 

Based on such teachings, applicants submit that the enablement requirement is met. 
Indeed, the 35 U.S.C. § 1 12, First Paragraph, Enablement Training Manual . August, 1996, 
provides that: 

Unless a specification specifically states something to the contrary, the term 
"diagnostic assay" is to be construed to mean any assay that can be used to help 
diagnose a condition , as opposed to an assay that can, in and of itself, diagnose a 
condition. . . Therefore, to enable a diagnostic assay use, a disclosure merely needs 
to teach how to make and use the assay for screening purposes. 

(Id at 22-23.) Here, the specification provides that "all of the above mentioned peptides can be 

used in diagnostics as sources of immunogens or antigens free of viral particles." (Specification at 

' 6, lines 6-8.) The hybridization assays for the detection of HIV- 1 are set forth at page 14, line 
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1 1, through page 15, line 8. Therein, applicants teach that hybridization techniques were well- 



fragments as a molecular hybridization probe - either by marking with radionucleotides or with 
fluorescent reagents - LAV virion RNA may be detected directly in the blood, body fluids and 
blood products [] and vaccines ..." (Specification, page 14, lines 17-22.)(Parenthetical 
removed.) For example, applicants teach that hybridization assays using nucleic acid probes for 
Hepatitis B virus were known in the art. (Specification at page 14, lines 29-32.) 



Further support for the knowledge in the art at the time the claimed invention was made is 
found in Arya et al., "Homology of Genome of AIDS-Associated Virus With Genomes of Human 
T-Cell Leukemia Viruses," Science , 225:927-930 (August 31, 1984) (Exhibit 1). Therein, the 
authors exemplify hybridization experiments between HTLV-I and -11, and HTLV-IIL 

In addition, Hahn et al., cited by the Examiner, further depict the use of an HTLV probes 
in hybridization assays. (Hahn et al. at 168.) It is noted that the Examiner relies upon Hahn et al. 
to teach the cross-hybridization between sequences of HTV and HTLV, even in stringent 
conditions. (Paper No. 20, at 4.) However, Hahn et al. discuss that the complete genomes of 
HTLV-I, HTLV-Ib, and HTLV-II were digested with restriction enzymes and hybridized with the 
full-length of HTLV-in probe in "relaxed conditions." (Hahn et al., page 168, second colunrn, 
lines 3-9.) In particular, the legend of Figure 4 indicates that low stringency hybridization of 8 X 
SSC, 20% formamide, 10% dextran sulphate at 37 ^'C, and washing conditions of 1 X SSC fi-om 
22-65 °C were used. The fact that Hahn et al. use a low stringency hybridization indicates a 



known in the art at the time the application was filed. It is stated that "[u]sing the cloned DNA 



FiNNECAN, Henderson, 
Farabow, Garrett 
8 dunner.ll.p. 



•esire to cross-hybridize with other sequences. Low stringency, one of ordinary skill in the art 
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would know, generally leads to greater cross-hybridization results. By using the low stringency 
hybridization conditions, Hahn is obviously attempting to show cross-hybridization between each 
of the "members of the HTLV family," The text of Hahn, at the bottom of page 168, indicates 
such an attempt in order to "evaluate sequence homology." 

Thus, Hahn does not attempt and cannot be read to show whether or not specific 
hybridization is possible with the claimed invention herein. Hahn et al. simply did not attempt an 
experiment, which could show that possibility. Figure 4 of Hahn, therefore, cannot support the 
Examiner's conclusion. 

On the other hand, Alizon et al, in "Molecular Cloning of Lymphadenopathy- Associated 
Virus," Nature 312:757-760 (1984) (Exhibit 2), describe the discriminating hybridization assays 
using a probe specific for HIV-1. Therein, high stringency hybridization conditions of 50% 
formamide and 5 x SSC at 42 and washing conditions of 0.1 X SSC at 68 were used. 
Therefore, it would have been readily appreciated that the determination of the hybridization 
conditions is well within the purview of the skilled artisan and dependent upon the goal of the 
particular research. No reasonable evidence to suggest that the nucleic acids recited in the claims 
could not discriminate between different retroviral DNA sequences has been presented by the 
Examiner. 

To the contrary, appUcants submit that the claim-designated nucleic acids are unique to 
HIV-1 . Therefore, one having skill in the art would acknowledge the use of such nucleic acids as 
probes in hybridization assays. 

-5- 
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For example, ORF-Q corresponds to V7/ protein of HTV-l. The vz/ protein (virion 
infectivity factor) is also known as sor, A,P\ and Q. (Gallo et aL, "HIV/HTLV Gene 
Nomenclature," Nature 333:504 (1988) (Exhibit 3).) The v//protein is not found in HTLV-1 or 
HTLV-n (Gallo et al. at 504), and therefore, a nucleic acid probe corresponding to this protein 
would not detect these viruses in a hybridization assay. 

Furthermore, although a vif protein is present in the genome of HIV-2, the nucleotide 
sequences of the v// proteins of HIV- 1 and -2 have only about 45% homology. This is shown by 
a comparison of the nucleotide sequence of ORF-Q of HIV- 1 given in applicants* specification 
with the nucleotide sequence of ORF-Q of HIV-2 (i.e., vif) given in Guyader et al., "Genome 
Organization and Transactivation of the Human Immunodeficiency Virus Type 2," Nature . 
326:662-669 (1987) (Exhibit 4). Exhibit 5 shows the nucleotide sequence comparison of the two 
sequences. Because there is only about 45% homology between the nucleotide sequences of the 
two proteins, a nucleic acid probe corresponding to vz/protein of HIV- 1 would not detect the 
presence of HIV-2 in a hybridization assay. 

The nucleotide sequence of ORF-1 corresponds to vpr protein, also known as R protein, 
of HIV-1. The vpr protein is not found in HTLV-I or -E (Gallo et al. at 504), and therefore, a 
nucleic acid probe corresponding to this protein would not detect these viruses in a hybridization 
assay. 

Furthermore, although a vpr protein is present in the genome of HIV-2, the nucleotide 
sequences of the vpr proteins of HIV-1 and -2 have a homology of only about 39%. This is 
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'^.own by a comparison of the nucleotide sequences of the vpr (ORF-1) of HIV-1 given in 



-6- 



LAW OFFICES 

Finn EC AN, Henderson, 
Farabow, Garrett 

8 DUNNER^LL.P. 

I300 I STREET, N. W. 
WASHINGTON, DC 30005 
S02-40e-4000 



iSerialNo.: 08/475,822 

applicants' specification with the nucleotide sequence of vpr (ORF-R) of HIV-2 given in Guyader 
lit al,, cited above. Exhibit 6 shows the comparison of the two nucleotide sequences. Because 
here is a homology of only about 39% between the nucleotide sequences of the two proteins, a 
nucleic acid corresponding to vpr protein of HIV- 1 would not detect the presence of HIV-2 in a 
lybridization assay. 

The nucleotide sequence of OKF-2 corresponds to tat (transactivator) protein, also known 
is tat'3 or TA protein, of HIV-1. The tat protein is not found in HTLV-I or -11 (Gallo et al. at 
504), and therefore, a nucleic acid corresponding to this protein would not detect these viruses a 
lybridization assay- 
Furthermore, although a tat protein is present in the genome of HIV-2, the nucleotide 
sequence of the first exon of the tat proteins of HIV-1 and -2 have a homology of only about 
48%, and there is ahnost no homology between the second exon of the tat proteins of HIV-1 and 
2. This is shown by a comparison of the nucleotide sequences encoding the first exon of the tat 
protein of HIV-1 and HIV-2 (Exhibit 7). The nucleotide sequence of tat protein of HTV-l 
^(ORF-2) is given in applicants' specification, and the nucleotide sequence of tat protein of HIV-2 
is given in Guyader et al., cited above. For the sequence oitat protein of HTV-l, see also 
Arya et al., "Three Novel Genes of Human T-lymphotropic Virus Type IQ; Immune Reactivity of 
Their Products with Sera fi"om Acquired Immune Deficiency Syndrome Patients," Proc. Natl. 
Acad. Sci., USA. 83, 2209-2213 (1986) (Exhibit 8). Because of the minimal homology between 



the nucleotide sequences encoding the two proteins, a nucleic acid corresponding to tat protein of 
*IV-1 would not detect the presence of HrV-2 in a hybridization assay. 
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The nucleotide sequence of ORF-4 corresponds to vpu protein of HTV-l . ( See e,g. , 
Cohen et al., "Identification of a Protein Encoded by the vpu Gene of HIV-l," Nature . 334 , 532- 
534 (1988) (Exhibit 9).) This reference gives the amino acid sequence of a protein encoded by 
the vpu gene at page 533, Fig. lb. A comparison of the nucleotide sequence of ORF-4 and this 
ammo acid sequence reveals that the protein of the reference and applicants' nucleic acid 
correspond to the same region of the HIV-1 genome. 

The vpu gene is not found in HTLV-I or -H (Gallo et al. At 504), or in fflV-2 
(Cohen et al. At 534, col 1). Accordingly, a nucleic acid corresponding to the vpu gene of HIV- 
1, when used as a probe in a hybridization assay, would not detect the presence of HTLV-I, 
HTLV-n,orfflV-2. 

Finally, ORF-3 corresponds to nucleotides 5383-5616 and ORF-5 corresponds to 
nucleotides 7966-8279 of the HIV-1 genome. (Specification at page 13, lines 3 and 5.) ORF-3 is 
located between the end of the pol and Q proteins and the beginning of the env protein of HIV-1 . 
(See Wam-Hobson et al., "Nucleotide Sequence of the AIDS Virus, LAV," Cell, 40, 9-17 (1985). 
(Exhibit 10).) Applicants' ORF-3 nucleic acid corresponds to nucleotides 5459-5692 shown at a 
page 1 1 of this reference. ORF-5 is located at the end of the env protein of HIV-1. Applicants' 
ORF-5 nucleic acid corresponds to nucleotides 8042-8354 shown at page 12 of Wain- 
Hobson et al. Corresponding regions are not found in HTLV-I, HTLV-II, or HIV-2. (See the 
nucleotide sequence of HIV-2 given in Guyader et al.; the nucleotide sequence of HTLV-1 given 
in Seiki et al., "Human Adult T-cell Leukemia Virus: Complete Nucleotide Sequence of the 
'^rovirus Genome Integrated in Leukemia Cell DNA," Proc. Natl. Acad. Sci., USA, 80, 3618- 

-8- 
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3622(1983) (Exhibit 11); the nucleotide sequence of 3' region of HTLV-I and -E given in 
Haseltine et al., "Structure of 3' Terminal Region of Type 11 Human T Lymphotropic Virus: 
Evidence for New Coding Region," Science. 225 . 419-421, 420 (1984) (Exhibit 12); and the 
nucleotide sequence of the 3* region of HTLV-I and -11 given in Shimotohno et al., "Nucleotide 
Sequence of the 3* Region of an Infectious Human T-cell Leukemia Virus Type 11 Genome," 
Proc. Natl. Acad. Sci.. USA. 81, 6657-6661, 6659 (1984) (Exhibit 13).) Accordingly, a nucleic 
acid corresponding to ORF-3 or ORF-5 of HIV- 1, when used as a probe in a hybridization assay, 
would not detect the presence of HTLV-I, HTLV-H, or HIV-2. 

Based on the foregoing remarks and exhibits, it is clear that the peptides recited in the 
claims are useful to discriminate between retroviruses in diagnostic assays. 

In addition, applicants submit that "The enablement analysis should be based on whether 
there is evidence that one skilled in the art could not have used the compound for any disclosed or 
well-established use [without] undue experimentation." (35 U.S. C. § 1 12, First Paragraph, 
Enablement Training Manual, August 1996, at 21-22.) Therefore, the Examiner must provide 
evidence that the claimed peptides could not have been used, for example, in hybridization assays. 
No such evidence has been presented. Therefore, a prima facie case of lack of enablement has 
not been made. 

In view of the foregoing remarks, the claimed invention is clearly enabled by the 
specification and withdrawal of the instant rejection is respectfully requested. 
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tf there are any other fees due in connection with the filing of this response, please charge 
the fees to our Deposit Account No. 06-0916. If a fee is required for an extension of time under 
37 C.F.R. § 1.136 not accounted for above, such an extension is requested and the fee should also 
be charged to our Deposit Account. 

Respectfully submitted, 

FINNEGAN, HENDERSON, FARABOW, 




Reg. No. 25,146 

ERNEST F. CHAPMAN 
Dated: December 24, 1996 Reg. No. 25.961 
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48. Homolo^ of Genome of AIDS«Associated Virus 
with Genomes of Human T-Cell Leukemia Viruses 

Surtsh K. Aryc. Robtrr C. Caiio. Beamct H. Hahn, Qtof%t .W. Stw. SUksdai Popavtc. S. 
Zakx Saiaiwddm. Flossie Wong-Staci 



Human T-ceil leukemia virus (HTLV) 
was ftrst identified as an tnfecuous agent 
etiolofically associated wuh adult T-ceil 
leukemia lATL) (/). A related but dis- 
tinct retrovirus was isolated from a T- 
ceil variant of hairy cell leukemia i2). 



These viruses, known, respectively, as 
HTLV-l and HTLV-U. show a tropism 
for human T ceHs. particularly 0KT4' 
cells, and have the capacity to immonal- 
ize and transform normal T cells m cui- 
turt (i). alter certain T-cetl tmmune 



functions m vuro i^i. induce the forma- 
tion of giani muiiinucteateij T cetls iji. 
and. m lomc cases, sckcnvciy kill cer- 
tain " cetls (4 1 These propenies and 
Jata .*3m epidemiologic studies of the 
acquired immune denciency svndrome 
I AIDS I. *hich IS unifonniy associated 
with 0KT4' helper cell depiefion I />. led 
us and others to speculate that a 
member of the HTLV family mtfhi be 
the ettolofical agent of this disease. In 
support of this hypothesis was the find- 
ing that up to 80 percent of AIDS pa- 
tients, but less than I percent of non* 
AIDS patients from similar nsk groups, 
have serum antibodies that react wuh the 
envelope protejn of HTLV Howev- 
er, actuai tsoiations of the known sub- 
groups of HTLV uhat IS. HTLV-l and 
HTLV-U) from AIDS paiiems were in- 
frequent ( 10), 

Recently, we reponed repeated isola- 
tions of a T lymphotropic retrovirus wnh 
Otopathic but not immortalizing activity 
from patients with AIDS (//). This virus 
;jn be grown in a previously immortal- 
ized T-cell line iHT) that is retativeiy 
resistant to the cytopathtc effects of the 
virus and can grow in the absence of T- 
cell growth factor tinicrieukin-2) {12). 
Using the infected celU as well as pun- 
Aed virus panicks m immunological as* 
says, we found that the serum of SO to 
100 percent of AIDS pacienis and 70 to 
80 percent of paiients with lymphadcno- 
paihy syndrofM rtactcd positivciy ilJ). 
Om xtm basil of its T-ccU tropism. the 
sot and Mr* pnfartnce of its revets 
tmMchpusa. tha size of its major core 
prmmm 1 24,000 daltons» U4), some ami- 
fnic cmwtactivity of its proteins wuh 
HTLV-I and HTLV-ll (/4). and its ca* 
pacity to induct fonnation of gunt muki- 
nucleated cells U2). we considered this 
virus to bt a member of the HTLV 
family and designated it HTLV*1IL Here 



we ^how 'hat certain sequences of :ne 
genome of HTLV. Ill and both HTLV-l 
and HTLV-ll arc homologous, ^nn :hc 
most conserved sequences being iocaied 
within the qa^^pol region and less cut 
detectable homology occumng m \r\t 
€n\ and pX region. 

Virus particles were punfied from su- 
pernatant fluids of HT ceils, clone 9 i H9i 
infected wuh HTLV-HHHTLV-HIj, b> 
cemnfugation through a sucrose densitv 
gradient at equilibrium (/2). HTLV-Hl^ 
was ongsnaJly obtained from pooled su- 
pcmatants of shon*ierm lymphocyte cul- 
tures of AIDS patients. Virus panicles 
were also punfted from normal penpher- 
al blood lymphocytes newly infected by 
virus of a pnnury leukocyte culture of 
another AIDS patient IHTLV-III^) (//). 
Th« panicics were iysed wuh sodium 
dodccyl sulfate tSDS). digested wuh pro- 
tetnase K. and directly chromatographed 
on an oligoldT) cellulose column. The 
resulttng polyadenylatt [polyt Ail-con- 
taining RNA was used as tempUte lo 
synthesize ^*P-labeled complemenur> 
DNA (cDNA) in the presence of oli- 
go^dT) pnmers. The size of the resultant 
cDNA ringed from O.l to 10 kb (not 
shown K When these labeled cDNA s 
were hybridized to polytAKontainmg 
RNA punfied from infected and unin- 
fected H9 calls as well as ocher uninfect- 
ed human cell lints, only the infected H9 
ceils contained homologous RNA se- 
quences as evidenced by discrete RNA 
bands after Nonhem hybridization. Fig- 
ure I shows thai cDNA preparations 
from HTLV-IIU and HTLV-IUz gave 
identKii patterns, detecting RNA spe- 
cies of about 9.0. 4.2. and 2.0 kb. These 
bands are similar in size to those corre- 
sponding to genomic size messenger 
RNA imRNA) and spliced mRNA s of 
en\ and pX sequences previously ob- 
served in cells infected with HTLV l 



consistent ^nh the anucipated -e- 
liiedness of these viruses. Furthermore, 
.iral mRNA bands oi HTLV-II-mtected 
cells ^erc detected wuh an HTLV-IH 
^DNA probe iFig. lb. iane and again 
the ^lies of the mRNA *ere like those 
auh HTLV-I. 

To determine directly the homology 
between HTLV-lH and HTLVW and 
HTLV-II. we hybndued HTLV^IH 
cDNA to cioned genomes of HTLV-I 
and HTLV-II digested wuh specific re- 
smction endonucieases. Complete ge- 
nomes of a prototype HTLV-I \I6). an 
HTLV-I vanant called HTLV-Ib \16k 
and HTLV-II were digested wuh two 
restnction enzymes as indicated in the 
legend to Fig. 2 and biot-hybndized to 
'-P-labeled HTLV-llI, cDNA, A region 
spanning the ga^ and pol genes showed 
the greatest homolofy. For the proto* 
type HTLV-l. this corresponds to the 
l."-kb Pst 1-Pst 1 fragment and 3.3-kb 
Ssi l-Sai I frafment. HTLV-Ib, which 
lacks a Pst I site indicated in parentheses 
in Fig. 2. revelled the exptcted }.0-kb 
Pst l-Pst I frifmem instead. Similarly, 
strong hybridization to the qag-poi se- 
quences of HTLV-II also occurred. This 
IS reflected in the 4.2-kb Bam Hl-Xho I 
fra«ment and the 4.0*kb Bam HI-Eco Rl 
fracm<nt (Fig. 2. laiws i v)d 6). 

Fragments corrtspooding to ih« tnv 
and pX s«qutiicti of HTLV-I and 
HTLV-H alio hybndixtd w«akiy wiih 
HTLV-UI, cONA (sg« th« 2.i-kb Pst I- 
Pst 1 Mtf Um M*kb Sit I-Pst I fngmem 
inFig.2.taMil«sdidtlM i.4-kb Pst 1 
ft^nai of HTLV-(b containing only 
pX ig^iignrn (Fig. 2. lam 4). Tht east 
of dglictmi of tlitso sequences vancd 
widi dtftrent prvpanttoni of cONA. 
probaMy because of variable representa- 
tions of the 3' end of the virus genome. 
We used cONA from both HTLV-UU 
and HTLV^IIIt, Figure ) shows the re- 



sults for HTLV-llI^ cDNA 5urc!onc^ 
oi HTLV-l containing Ciffcreni regions 
Of tne genome ^trt nvcnaued :o 
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Fig. I. HTLV-UI-speciflc wqwnc«s m ctilu- 
lar RNA from HTLV-»nfected ctiis. PolytAi- 
Mtceted ctllular tNA was tizt-wpanted by 
formaidehyde-ag&rote gtt «l«ctrooftortsis. 
tnissfcrrtd to Zeta prob« membrane \ Bio-Rad 
Labs) by tiectroeiuuoo utd hybndizcd to (A) 
HTLV-tll, cONA and (8) HTLV-IIU 
cONA. fAaAdB)Lane t. uninfected H9 cells 
i5 u<): lane 2. HTLV.IlIr-tnfected H9 cells 
( 10 Mg); laiM 3. leukemic Jurtai cells 1 10 ^j: 
lane 4. HTLV-l^nfected C5/MJ ceils (3 ^j: 
and lane 5. HTLV-II-^nfectcd MO cells i5 
htg). (B) LaM 6. a longer exposure oflane 5 m 
(B). Po4y<A>-»eiected RNA was prtpartd by 
guamdine-HC extraction and cesmm chionde 
centnfUpuon foUowed by o^tgoidT) cellulose 
chromatoripby u descnbcd i24\. The cOS A 
wu tnaacr^bed from po4ytA)-s«lected v»nis- 
assocnted UNA with the use oligoidTi u a 
pnmer aad avuA myeiobiastosts vtriis RNA- 
directed ONA poAymeraae u descnbed li^). 
The hybndixauoa waa performed at 3T1: for 
\h houn la a rmxnire comaimng 40 percent 
fonnamide. 5« standard sodium chionde uid 
sodituB cunce iSSC; 0. 15M NaCl and 0 
sodium citrate. pH 7). 0.05M sodium phos- 
phate buftripH 7). 5 « PM<0.02 percent each 
of bovuM senim albuoua. po4yvmylpyrro4- 
tdooe. and Fkod 400). yeast RNA (200 Mg^ 
ml), deaaturad saimoa sperm ONA CO 
ml). O.l percent $0$. and 10 percent dexiran 
suito. T}ie membrane wu subsequently re* 
ptatedly washed with 2x SSC andO. 1 percent 
SOS at ^TC. ur-drsed. and exposed to \ 
Kodak XAA Ma with the use of intensifying 
screens. 



262 



HTLV-Ulz cDNA iFtg. 3A). w,tn the 
exception of fragment c. wriich corre- 
sponds to an mtemaJ ponion of the poi 



gene, all fragments ^ere detected rv 
hvbndization. includmg fragment a 
(LTR-yc^j after long exposure of the 
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Fig. : RtUitdn«« of ih« gfiKKiw of HTLV-IU, whh tht ftnoom of HTLV-J and HTLV-ii 
Sitci of diftKiofi by xtm nkvant rtstnction cfuymts and tht txptcttd suts of the fragments 
ar« sho^n btiow tht ftis. Clootd HTLV-l (XST), HTLV-lb UMC). and HTLV-ll ipMOi 
DSA'% w«rt diftsud with tht indicated restnction tiuymti and fra^mtnts wtn separated b> 
atwoie 0ti ekctrophorttis. transferred to a rutroceiluiost membrane i2J). and hybndiicd «ith 
HTLV.Ul«cONA. Uaea I and 2. HTLV.t uST) ONA diftsted wuh Sst 1 piu% Pit 1 and Sst 1 
pli^Sil L retptctiveiy . lanes 3 and 4. HTLV.(b iJiMO ONA dtpsted «uh Ssi I pius Pit I and 
Iptaa StI 1. rtsptctivffiy: tants 5 and 6. HTLV-U ipHO) ONA diftsted with Bam HI plus 
^1 aid Bam HI plus Eco lU. respectively HTLV-l uST) and HTLV.l uMO clones «ere 
wmd from the ftnomic libranes of ON A s from ATL ptuems S T. and M.C. . respectivcis 
J celhiiar ONA s were cloned at the Sst 1 site of pha«e J^WES kB ONA Wd). HTLV I 
ikST) It a prounype HTLV-I and HTLV-Jb ikSkC) is a varum of HTLV-l that contains some 
diverfem restriction snayme sues. tncMing the lack of the second Pst I site from the 5 end of 
the virai genome 1 16). HTLV-ll <pMO) v^m obtained by subctomnf kM0t5 A (Wi ai the Bam HI 
sue of piasmid pBR3:: ONA. The cONA was symhtsued as descnbed tn Fig. i and 
hvbndization *ai ptrformed at }TC for 16 hours m a mixture contaimnf 30 percent formamtde 
5 « SSC. 5 ■ PM. denatured ON A » 100 t^e mi». 0. 1 percem SOS. and 10 perctm dexiran sulfate 
The membrane subsequently cashed and exposed as descnbed in Fif. 1. 
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Fir } R<laiedn«$ of the |tnom« of HTLV-Ulj wuh iht |«nom«^ of HTLV-i and HTLV-tl 
DN A from suoctonc* of HTLV.|,j and HTLV-lI,^o diftitcd *uh the todKat^d rtsmction 
<nzvm«9. Frafm«ni» *<ft xparatcd by a|Aro$« gtl cltctrop^umsis. trmnsft rrtd io a mcrocellu- 
lotc mtmbran< \24), and hybndiztd with HTLV^UI^ cONA. (A) HTLV-l subclone) ^cr« 
conscnjctcd by lAocfuii" ckxiing of fra4mtms fcn«raccd by codiftiuon ^\ih P%( I and Sst 1 
into conuiAing fn$mtn(ft dtsi|n«(cd a to « on the illustrittd rtimctton map of HTL V- 

I. Th« vini mMfts w<r« rtttattd by diftscton with the aporopruic tnzymts. tSt HTLV.(1 
ipMOt ON A: LaiM I. diftsted with Btin HI; lane 2. diftsttd with Bam HI piu% 5m« i; lane 3. 
difC5(cd with B«ffl HI p4tts Xho I. The cON A was synthesued as in Fi«. I and hybndtutton 
performed as in Fif. 2. except that the hybndization mi.\turt contained 40 percent formamide. 



iliipw Similarty. the 3' haif of 

HTLV«ir«amiotd ia tM 3.5-kb Bam 
HU9m HI finctMQt and the 2 J*kb 
Ban m-Xho I frifiMm couM be detect- 
ed wnk tMs paitkuiar HTLV-IU cONA 
probe iFif. 3B). 

Retroviruses cailed LAV (or somt- 
times lOAVi and IDAV.) have been 
isolated from patients with tymphadeno- 
pathy syndrome and AIDS (/7). Al- 
though LAV has btcn reported to lack 
rejatedn«ss to HTLV-I and *ll ilT). fur- 
ther charactenxation of Us proteins and 



nucleic acids may reveal that LAV is 
related to tbest viruses and is tdenticai to 
or related to HTLV-IIl. 

The present dau thowinc that certain 
nucleotidt sequ«nces of HTLV-Ul are 
homolofous to sequences of HTLV-I 
and HTLV-II support our proposal that 
this virus should be classified within the 
HTLV famdy. However. HTLV.llI is 
much less related to HTLV-ll and 
HTLV-I than HTLV-II and HTLV-l are 
10 each other. It is of interest that sttll 
other HTLV-related T lymphotropic ret- 



26* 

rovinaw htvt been identified m Old 
World monkeys [18). These pnmate vi- 
ruses ire closeiy reiaied to HTLV-I and 
only minimaily to HTLV-II H9). Al- 
though the most conserved sequences of 
HTLV-ill are in the region spanning the 
junction of the predicted fflf and poi 
genes, other weakly homoiogous se- 
quences are also detected in the tn\ and 
pX genes, HomoJogy m the gag and env 
coding sequences has already been sug- 
gested by immunological cross-reaciivity 
between these antigens derived from the 
three subgroups 1/4). Homology in the 
pX regjon is an additional demonstration 
that HTLV-Ul belongs to the HTLV 
family, which is unique among retro- 
viruses in Its possession of the pX genes 
[20. 21). It IS inuresting that pX is tht 
most conserved region between HTLV- 1 
and HTLV-Il {21) and that boch of these 
viruses can transform T ceils in vtiro. In 
contrast, the pX region is much less 
conserved in HTLV-Ul. a cytopathic 
V irus thai lacks transforming activity ill, 
12). 

Compansons of the LTR regions be- 
twcen HTLV-l and HTLV-II have re- 
vealed a conserved 2t*bp repeat se- 
quence in two o(h«rwise very divergent 
LTR's {22). Tht location of this se- 
quence upstream of promoter seqticnces 
suggests that ii is stmalar to other virai 
enhancer s«qucficts. In view of the tr»* 
pism of HTLV-IIl for 0KT4- lympho- 
cytes, it will bt iourtsung to set if this 
vifMs also hag suck an enhancer se* 
qtMMi m M LT1L Oir prtsmt study 
doaaMi aiow us to compare speciiicaiJy 
Um CfloT HTLV.III to those of HTLV- 
1 aad -11. Howavtr« the weak signal 
obtaimd witli S' and 3' ulttmate frag- 
mems containing tht LTR suggest that 
these eiemtncs hava minimal or no ho- 
mology. 
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F'^nfcv L M Wvkf . ft A w«m £di Oifcrd 
Uni* Prtn. Oiford. 19S4I. vo< J pc ii3-i*^ 
:. V S KAiy«funjfuii tt at . Sctfmrf its 

3 I Miyothi ft ai.. Sat^ff 'Lon^c^* IM "0 
it9€li. M Popo%ic. G Une« w,mi»n ? S 
S«nn. 0 MtiiA. ft C GAiio. f^oc ^cti acj^ 
Set USA m. 5402 il9«3). f 0 Markftam 

at .lmt J Ca<H-*f 31. 413 i!9«3». 1 S Y CMn 
ft ai.. Vtf/M/e 'Lom^am, M. 502 il4«3t. M 
Pooovtc tt oi., m Hmm^m T<t\\ LeuMfmia V,. 
fys4s, ft. C G«ilo. M. £tMX. L Grow, lis 
iCoM SpHM H«rbor L»dor«cor>. Coid Sonni 
Htrtor. N Y . in 9n%%t. 

4 M. yppovic ft ai.. ift pfpfiitoft. 

5 M. Po^«c. F Wo«^.Suftl. f S SAnn. ft C 
GWIe. 44v Vir^ O^Koi. 4. 43 ( l»r. K 

P Clavtam. ft. OMiafsoii-fopov. ft. a. Wmt. 

tmt J. CoJicrf ». 321 (iWi. 
6. H. MrtMvt ti 4i:. Scifmcf 223. 1291 1 19S4). 
^ Ctmtn for Oimsm Cowrot TiU Foret on 

V £m^i. J. Mr4. M. 244 UW): J P HMn- 

St. g. f. Wonmer. C. P Maceuirv. L. J 
Lortn». C. t>a«tft, iM. IVT. 4e« mwi. J 
W Cuma . .M. its. 4t 1 1SS4) 
t. ft. C GaMo. P S. S«m. W A BUitfitr. f 
Woag-SUAl. M. ^9«0«<. tM i^imw] III Omeot- 
owy AIDS, I. C. GfoofMM. Ed. >GniiH A 
StrancM. %m 0mm. llStf). 12*1?. M. Csws 
ifl MmtmmTCtU Uukemm Vir^s^s. ft. C. 
04^. M. CtMX. L. Gfou. E4t. iCoM S«nf« 
HM«or I ihiwory. CoM HArter. N Y . 

ta pmt). 

9. M. EsMS ft irf.. 5ci«i«rr 231. I5f (tW). 
221. IQII nW): T. H. Lm 

4c«^. in. C/.i.4.. tfl prtu. 

10. ft.CG«aorr^..5rir'vr2ai.M5Mfl3): a H 

HaJM H m >tCf irrrf immmmf Of^r#<Hnr 

GfOOfNtM. £4t. (Uu. N«w Yoft. la prtu). 

11. ft. C. Odio #f d.. Sofmet m. 300 1 1 Wi 

C. C£. iM.. ft. «r. 

13. M. C. Stf^itfMa. M. PQPOVK. L. gnicfi. i 
SclMptodi. ft. C. Oallo. f. 50*^ M. G. 
SanHiefemfl €r W.. ta pf««ancioa. 

14. J. sTiiiefciLft ff M., Sci#«rr U«. 303 ( IWi 
13. G. PfMdM. P. W«^S(Ml. ft. C. CaUo. ^ 

/v«il. A(a4. ici. (/.ij4.. « arm. 

15. a. Hate #f atf.. /«. y. Cawrr. la pm. 

17. P gMtflawwiii ft a^.. Ware 131. Hft 1 1H3). 
£. VtlMT ft Lawc n Isa M. 7S3 ( l«4); t. 

^iningfr #f ai /r«aM« T^'H/ /^•4/<^«« 
^Vin$$0$, ft. C. M. Cites. L. Gfott. Edt. 

iCoM %mm Htf^ Laaomory. CoM Sfnaf 

Htftar. nTy..mip(«u». 
IS. H.«C.C««.P. Wag^.St^.ft.C.Catto.Jrtr^rf 

as, tt«9MWi. 
». M. Saifa, S. Hanan. Y HirayaM. M Yo*i*.da. 

h9€ SmU. A€^. in. m. 3410119131 

2t G. M. Saa«#f atf.. tM. t1. 4S4a dliai. 
22. i. SoaraOi #f atf.. ^M.. ta arm. 
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|)OAMruio«4 c<lJi w«r» pr*o4ml by ««ymAU< of M t« 

di(««tc<l fof JO mil* With « (BL»yr« of a>iU««a*M 0N«*« 

ihro««* t»uiM m4 «fi*nfu<«d «t MOf.p *. for 15 mi<i- TH« 
fo*up*a<<«d irt M(« 6*»««r «D<M*ain« 0.1% 6o*im 

4ib««m { aSA) cod eaiU o«tn/u|«d ti WO r.p.m. for 1 3 mtn. 

Th« oii (Mtlvt «M 44AIA f«****«#<l«d ia M!9»-0.l% tJA thirftf 
dt«vi6<i(«ti in 900-»J *lM^iMtt to 11x75 9<«*oc utw*. TK« 

mm ftdM la 4 tOO mJ roJum* 4«d iAai^t«d for 4 AUUmMtom* 
4iW ooruaw«fo«M ««*4««4 6y fi d i o i«M H * o 4t4*y ^s^M 
uui4<r« purclu4<d Cnm EwUmm Scktu^ OtMMrd, CaJiformu. 
4Ad *H-<46«ii«^ ««fo»d fro« NEN. R4mJii tte m««4.4.«. 
of rvpJkaxM. Suiiiucmi 4«4iyM ««« t > *rf>Km o d by AMlyitt 
of vwuM 4#4 4lJ poutta 4rt *t4<uAoij|t ( f <aOl ) ffo« oo-ifoA. 
( ANHi-JJl ««4 pA of On «- HifwdUMJ* *^ 0. V«6«f 
of M«rcfc, Slurp tud OoAm Hm/c* Li6«rwon«i). 
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«J fli t 



Ef«i of o* ni ailii^ ii ton i wM •xwboc 

< lUi ^om tr u^mA oiik «m p«wf«i«tf « 4maikm4 ui Ptg. t m4 
iM»4«o4 «nJt *ymlM6c kmrnsm ACTH mOm aiOM eifvte} 
Of I* nttwfciimioa «nJi hummUt fawfi of <€ 
MMMVi). Ai4oo<4 r oa4 *«cnuo« «« ■■ M iiw i m Ut i ^ r 
awMMMy. It. C«lte ««« iMMMttd villi tyMiMUc japouMM-lt 
( AH-IO «MiHr AioM cuciH) or M (te 9««MM of •^«UMt«r 
4M«Mi of ANF<t.J3} <doood •qMm). Cmnni otib f*otn>o< 
MttiMt pt9<id«, thinfcy iartirmimt itM »bitiiy of ANF i« 4^ 

4.4.M. of 7 r*p4tca(«t Md «i) poimi Art iifiiticuK wAca o.^^ 

wnh th*tt f«tOMj*« eoatfoi 0.001). Sywiwtoe »IACTH<I.H> 
4A4 AAfMUfuiA II ««f« tywJMHMd by Or NktoUi Uii« by 



u 4 n^tnufttic hormone. 4nd no* m inhifeiuni 5«i4l 
tnd i(imul«ted 4ldo«teT0fl< formttion. tu|4«iu (ft4t lu &toia(i- 
ctl 4a(*ui<t 4ft 4n irtiegnl p4rt of tft« homeoit4tic m<cft4ni\m« 
re|ul4tfnf lodium rtttntion. Furthermore, uniike lomatoitinn. 
Its mhifoiiory <5ec: w not rtstnced to 4n|io(cniin-tnmut4ced 
4ldoit«fone lecretjon. but iflecu the form4tion of both 
and stiffluUted mmerilocarticoids. Moreover. At no point «as 
ANf(S-))i observed :o iiimui4(e 4ldc«cefone. The ob«rvicions 
reported here provide the fround«ort for defning ihe mechjn- 
umi Oy i*hich 4tn4l-d«nvtd p*p«»dei aflea todiuffl reteniion 
4nd suti*^^ i*^** p«p<id« ffl4y 6* retponttbu for the tuenu- 
4(ed eifccu o( AN-tt on th< 4dr<n4t cortet dunng todtum 
lo4dinf'*'". Th« ufldtnundinf of torn* clintcji formi of 
idiopathic hypo* 4nd hypertension'* '* m4y therefore rrsutt from 
dcAnifli (h« in«ftciiOA4 b«t««n ANf . ihc 4dren4J cortet and 
the b4iic m«ch4n)tms rc|ul4tin| ANF lecreiion. 

A/t«r lubmtuion of thtt m«nuicnp<. Chamer et ai^ and 
E>lje*n *( rcpon«d findinp iimtUr to ihwe reported here 

Wf thank On fL Guiiletmn 4nd P. Bohlen for their cnncal 
review tnd commenu on thii mtnutcnpt tnd (he secTci4n4l 
tt»rt of the Laborttoncs for Neuroeadocnrtoiocy (^of Heip 
pfcptracioft of tile nitnuscnpc. Thi« reteAi^ch w«ji iupponed by 
innu from the NIH (HD-0W90 4nd AM- 1 It 1 1) end the Roben 
J. Kiebert Jr. tnd Helen C. Kiebori Foundauon. 
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. ««. a « ^ J 

E. 4 '"-|- |-. M. f - 4;a C«m (M. lII-tfC<l« 

J. a 4 #M4^ a t i oe tM4 a Ml.tii M«wK 

*!. ' ,J-|L « A, iitill A. J. 4 A. r C«* i f 

V. a. Cmii « i«9«.i4n 1 1«42). 

a « ^ ^IM Lm. MT. JM-JM n«4*L 

t 4. s. «r4 » I I J 1 nwT ner^i. 

-J ■ n_ C " I r. T. 4 Ca 4. J. rill u i«3. WV4« ( trrii. 

M 4oa.a fii ti ■■>m.*i»fiyy»t. 

O, Mw^ lilt J. f.*Cm,t, 1 m. Mi'M iimu ^ 

i 

<; t.1 4 IM*. L L^Mm IJt»-ll*l 

^ M. a « A II' nj Mi, ( itni. 

Pf. < « A m«Cm. &. L C I Mm JR. "V-w (trT«) 
M. t. 4 I * 4i« I. 4II^Ma«OI) 

r. 4. O. / «ta ««M M«i4 fXM'-in (l«n 
L. Th >!■■. C. 4 i 

A. «« «1 ^ 



Molecular cloaiog of 
lymphadeaopatby-assodated rirus 



*Uo.a ^^en Hii i VWite ei E^wiye 4e )U c >4 rr* t CNU 141, t^J 
tUoit* 4e >■■■> ■iiiw ei U^^mmm Gimtb^ (INICKM U 143. 
CNti t> Jmm tabtw. 25 nM ^ O tM, 
75724 M CMn 15. Fiim 



I (LAV) li « bMM f««r»*iM 
im MMori* fM ft liw mi — i p«tM ««c* If 

_^ friM'i>>y « «r « Mfa fo«« of 

I 4tAeUmej tfdimm {AlMf, CMtf LAV M«t«t 

flM ptfitm oMi AiZ>S or pr«- 
AiOS^ Mri aU avaikWa 4«ia urn BiMt t i m «ick tW *<f«* b«i«« 
lib* cMMd^ afart af AIDS, m ««M li prafafaiarf •« aeitraiorf 
T lifiarnw «*4 bM a a^ipii far (te T-oaU eatac Orr4 <r«f. 



i. C1UUA. J ^ *• U t. 4«hMf. 



■LETTERS TO NATURE - 




' ^ «^ AIM ^AlOS. TW 
uu^^^^J,^ *«iU.«WlT»^^ Hf« 7ft III (KTXV, 
III) AID 5 f i niu (AIV)'*, u** 

c*<rKt«ritt U V ^ ••UcUf ci««l.« Its a ct«*«^ 

LAV cmm^^tm^trj OS A m»4 t€ tcrmm • U^rj W r,c^ 
p4ac« ewfMirf frM tte c«Mk ONa W L> V^«f«ct«4 

4iUf k • f«cr4alM 141^ Hr.i Ig tiu. 

Tht cONA fini-miNl of 1>V lymh^ucd in u 
«ndof«tiOM4. ite«rt«tK^ajv«t«d rucboo. LAV vmoiw ««r« 
punfiad rro0 (fi« tupi^uuiw FKJ otlU, « S-tymphobUstoid 
LAV.pn>du«te« Ua«'\ 4«4 th« moioa wu pnn«<j 
0li|O<d-n. TVw <ONA d<Mm. pUkVIJ. 73 «»d 12. carryiu 
mMtti of 2.x 04 tad 0.» (kb). f«spMUv<jy. wi 

ch«rta*nM*^WOHT (Fif. I ). Ail Uim iriMru h4v« « com»o« 
restnojoa ^ocra <t om tml, mdiciuv« of « cammoA pnmuu 
inc. Th€ 50.4m p«ir (bf) common W«dUN Ai I fncmtnt »«« 
s<qu<fM«d tnd ihowo (o oonum ui di«o<dA) umch pf««)«diA« 




T 
T 



I 1 1 



Id 



«««CrT«CCTT«A«t«<tT 



n*. I ft«MbM msm •f cONA 

M««Mk LAVcONA»wfywMa^wM« 
«aiv«t«d mom. mtit 



I d<fi »< d from lAV 
HI; H, «ini: H, 



of tupvnuuM of tiM I ir prodwiii^ Ftl Hm'* 'Irw iMiiioiM 

10 1.M Tru-HO 7.1, | «M tOTA) Md owJJIZTll 
cms WH rtor, 5<W r^4J««>.TW^»Soiw.. 

^ ' '"^ Trti x.sj 
Si^ ™- *f*< ^ 

^pi M ii^ i i Ally ■ " -jl.Tiii (iirtllj-^; 




TWO. - IT11-I1IIIX ri>vi3 

i:?:di.?;i;.r ' "^^^ • 

t*^4UU^a fn4m*m tMO d ooo d ioco MU«i#4 tad mo<m«. 





n».2 «*0»^do«-W«i,c<uuotrorLAVd««Bno«modn 

< * I .*J: 4^ 2 < 4 ,J of < 

*m 140,080 <.#^ a!*'); fj) t>v. 
' -.-M ^ f«T 

Miptm«Mi frM O) iio f i a id Mroui T (yvo'wicyT* f« dT 
•cb^ry): (2) LAV.f,«d«Ki«, ....^ t tympt^^ (»t 

-ii- <4]| •te^ w« *y-o4ocyf fro* 

LT" ^' " ' ^ 0«*-<* AIM (*T 7.<»8 «.#.«.). 
JJ*fc«*iC«U aitam i«poraiii«i pa4l««d Uro«4* 0 J «J 

•i inrtiwud. C«MMf«i«d «m« wm «^oo«d <hno dn«d cy^M 
(Z«o6tM) ^««4«d « » M $SC ( J M N.a. OJ M tddi*- 
a««KAAjr*^^<«l^jO««i^ilOT). Ihm fcy^nuJ. 
ood «<*-tfmii.uiod fLAVU .«»n (R,. i, ,.o«i<l< 

•ari«y Xtf'e^A P*r i^di for U.ldfc «n«f«m oMd.i.oru 
(50% fomMiMl*. 3 > SSC. 43 -C). »«Ji«d (0. 1 « SSC. 0. 1 % SOS 
*5'C, 2mJ0««), Md ««poMd for 20* (KodU XAW fti« woi 
M iaMMfyia« ktmo) m -70 X. 



of (te J' «nd of 



(HodoMfdCtta.no, 
0 ^A) UNA. 

Tlio «poo«a«y of plAVU oroi dowmiaod ifl • iwioo of filt«f 
hytadtaM osponMua tiaia« tick.«rtmUiod pL>V13 ioMn 

4i « p«6«. nm, « odopcod ipo«^ lodtoiqtio, wo oo«ld 
^ LAV irnio. RNA oor^ T odlt, FM ond oUior 

•^'*!r ^ -ap^Wuhod 
rmia: Rf. 2). LAV on. oIm doMod is • bom aomi* «U 
oitom (n«. 24. tteo 4) ffM 0 tiinpMliir wHH AIOT. ia 
Mpim 9€ihm hmm wn of vifiM ia Um MiponuuaL Uoiafocud 

<A«(Mod ONA « ite SovOm bta of LAV^of«Md T ly«i. 
phoeytw Olid CEM odk (1^ J), No hybcidimioo woi dottcwd 
m ONA froM oWoaod lypfco c ytoiof fw oomoJ li««r (dou 
no( tiio«r«) ifl iho «oflM liybndtiacioa eottdtbooo. A dunatru- 
tic 1 .45^ MtedtU fricmoM wludi oo.«««ntod with «fi im«m«l 
viroJ fr»«Mo« m /ftedtII«dooi^ pLAVI3 <ri$. t ) »0i d€tto«d 
ifl cho SotMlM Mott. loMfo «t 2J 0^ C7 kb »o«o oioo dot «cicd. 
Tofotbof. (booo dou tham due pLAVI3 ONA U «iof«nou4 lo 
tbo buoua ftaooM And 4o<ogu boU UNA oad iottfrmud ONA 
fonttt dofivod froa LAV.iofoood odio. Thiu, pUKViJ u LAV 
spodftc Botac o<i|o<dTVpfi«o4, pLAVU muu conuin the R 
«nd U3 rtfiom of tho lo«« (oraiooi repeat (LTR) u well 4s 
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ONA I 

■ ONA I 
CtMi 
< LAVW«i 



LAV^« 

I. nam i m n M CCM Imm LAV^«i 
MMr«a^ T mOi ate 5 mMmm: 

I r«r J «te Ml ■iM«niiit. iAv itoy 

I «^ UV (iMte MU-LAVT) m I0'4 

9^ ti irftiia MMi. Tte Iter «« is «r 

^ I ■ < 1^ J^^«C I « O a f lt% ^nMiri^tai 

2KtO't^ «r »Qt n < ^LAVIj'TLSr^ (Ti 

IO*c#A f« 10k « Tte ilMT «« «hM « « T 



the 3' <a4 of cb« eodug ragios, j 

M ii Hi. 4L KmmWmm iJT. AJ3I Mi iiS7 tef« 

(iMtoM IRU, or LAYIK «« nte lo ite cm «M 

I « L>VU (A JIf ) 9m4 LAVIi (AJtl). Allf fow 
//tedlf I bo«d« 0^47. 1 .45, 04«atfOL53k^ikt«nif««or 
cormpood tt> boadt io goMuc b4a( of HMIIIh 
ONA (n«. 3. Um S). Moltai kttadi (04 «Ad QJ3 kk) 1 

OOC AM i« (IM OtMUC MOC tiM fOA tity OMtt 



•«««nt»M or lAV pronnJ ONA. Kovtvtr, c*« CL»^ taad 

dlll-^l f«oM« or ^IAVI3. cho 0L5-tt /TMIU 
««m or AJlf awttia. tte RyU3 i«MkHi wnMm tAo LTTL TW 
fifidiAc or two «m«il Hm4m rn«Mua is Um 5* fvpoa rauiTorai 





Hg, 4 K«Kn«m a^ot of 1>V ^ro^r«J ONA •« cioA« aJI9 
(LAV|«) 4a4 ij«t (L>VU). < /ite^lJl f«uMMM uo* Lav 
1 ONA m dM« Ait* ««d iill. Jhom W«4II1 rf*4m««u 
»y fLAVJ3 aurt«rf *. thoM »«, 6y - TS« 
MO «r tte pLAVO «ONA do«c a «Ao«ti. 4. 
> «f iit«. ft^KnOM* u<a: I. ««mHI Jfyn 
H. HMtlU 1, X^l: f. ^1: t, Ca«t: S, u. Wl X* 
I wn ii (te tok « i Mte«. Tor OM wvagr, 

4t%m% maty $^tna^y. 
Mi*^ ONA (mm L>V^«a«o T ottti «m MituUy 

lOoiM Tn».Ha ^Hl, tOaM COTA I M N«a flirt mm rypt 
SW4I fo tor, li K 40.000 A <i««te fnoM (9 « I J kk) %w 

pPiiipiiiiiri 30 114 mt" 4nfm T^O m aam«r *Ad ute «o 
«Tl WfedOvM Trm-MO ^Ht. I«M tOTAJ. iU7.l (f,f. 
U) Hm4tU rag prtovarf W im Hoad^ tte <m cm 
r«aMdlhMNMIII tf f«ft|i4 M|4 ffm4 r >4< ^ ttooM* « M% 

i4i4rf tad uAm «o « Jl^tfo. LiMuoa 
m -»0«4aJ'^ ONA wim « M 
^ p - , I '>«JJjjrT4 0NA^(t«u*a). V- 

Pi ^ ipi fc ^ p*4«i v«t p4aMd mm «■ NM33I «r t 

CMm ncK MM. ^p p riiiMii Q : x I0» piiMH icpmimI 

««■ pvfmitf « M*C « t » ni—fcar^t iiliuia, 0J% SOS. 
2 « SSC 3 «M UnA •H*.(f«ci«i«ri Mn or pLA V 1 3 

«>l9*c#Apop4. FUm««ra*«Mr«r3xM«M.«0.1 > 
SSC 10.1% S0< « ax, ta4 m ICsdftk XAft-S Ala for 

24-10 k own iflUMiTyiso « -10 cl«»« 

^taMi *«n 90«« «o4 UM ri r i wr iiii pk*0m tamM i« C*a. 

^•Oi ONA «« mi «*4 i i f i i ia ite t^pfvpnAM eoadi- 

titm, fWMiai— a4»t ^mn in- rn i ky fcykodnmo Wou f 
MAV13 ONA. ote* .aoi <te r 4.dia« tH. nfj 

«MM M *^*^ "^mm flT ita LTl. Ait «o4 



«r «mi M ONA «w I 



loTdottttif LAVky pwtiW r««h«cio«orniio«k 
ONA. 

A ill MOW 10 boo fwheboa M poAyMp* or i Jll tikiwino 
i«« /TMIII or 2J. 1.45. OA oad 0J2 kk (Fif. 4). 

TW li fMdity doMood ia dM «o«mk b4« by ft 

MAV13 prabo, oJteM* cte 4J^ frapaoM a ML no &odifto 
^MMk.«iaglaMd AJI9 DNA hytiridlM co tU //mdUl 
boa^orAill ia w N o onw liybridaarioa ftad wmIuho oo^iditiom 
iadiooMi tkm A J«l io 0 Hkt4U\ vmrUac tad m « fvoo«B6^n<n( 
wao. Aioa, otkor W ftppid fMhcboa cttM to AJtl art id««ittGAJ 
to (boM or Ailf (ao( liMwa). Tboa, tho H«din fwnaion 
poOflraiatlMSoa«iMf«b4o«Mab««xploaa«dbyv«h4CMM wtthin 
cbo aaok ioo4oto or t>V oMd la iaToo tbo T ctllo. 

KTLV.I*' oad KTLV.Il*^ 1 iiiriii • pair or C^rP* tr^a*- 
fomtog fO frwM i w wiA « c w paa for tiio T^l m6«i,OKT4 
Soti t MB MM (ooMpmiao oaa LTR) om J lofi|' "V Kav« 
fta X fopaa tad «iiow •numnm m^mtam Hoaiolofy TH*y 
bybfidiai botvon th taml vt i ia rMMcuMy mao«n( oondtcio<is 
(40% rarmoauao. 5 xSSC) tad tbo X npom hybndiM o«a «( 
60% fonaooudo**. Thy«, o oo w rvtd X rtpao i$ t hftilmart of 
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this ci$M o( vims. W« hav« compartd cton«d l>v On A tnd 
ciontd HTLV*U ONA ipMO)* by btot.ftvftndiution tnd find 
no croti'^y^^iK^Mii in low i(nn|tncy oondtttont of hvondtz* 
•uofl »nd wuhtnt 1 - J1 X,), even •fur I d«vt ct^osure *( 
-TO*C uiirtf inccfltirynTf iCTteni td«t« not jho**ni 

Th« humtft T4ymphotropic retroviruses HTLV-tll* and 
ARV *. recently tsoUied Trom p«(t<nu «tth AIDS of pre AJDS. 
h«ve timii«r morphotoftcal. biochemical tnd imfnunoio|ical 
pfopcniei to LAV, whicft lui^etti they proPiply repreieni 
different t»oUt<t of the LAV pro(0(yp« ONA hybndiudon 
betwetn HTLV-IIJ and HTLV-I and -11 hMb«<n reported. fflo*t 
no(tcc«kly at th« fa$»poi junaion and Icsi so m the characrena* 
ttc X refiofi of HTLV-l and - ll", Ai mencioned aOovt. »< could 
d«t«o no «uch hybndtutioii and conclude that th« reported 
hoffloiofy iBuai h4ve b«<n du« to either ( I ) th« u«« of an 
undott«d cDNA aa hybndiution pfo6<. (2) the fact that th< 
i«ol«i«s m question difftr lubtuntiaHy from thoM we hava 
clon«d. Of O) the poattbiltty that HTLV-llI and a HTLV-I/H- 
like vinu w^re co-mfeotns th€ ceJU. The iaii poiaibility mmy 
alfto apo^y to tiM pr«limtn«ry repott of crota-hybndtxatjoa 
b«twe«fl a LAV.|ike vinia and a cloned HTLV.II ONA pro6«\ 
Thua, w« find no mokcuUr emdcnca of a relationship bctw«<n 
LAV sAd HTI.V. Fttrth«nnofe, th« L>V genome u -9 kb long, 
compared with tj kb for th« HTLV vir^s«»'' 'V 0«p<te th«ir 
comp«rm64« |«nom« uxcs. LAV do«s no< crots-hytondu* with 
V*uiu virus" t*9kb) (dau i»o< shown) or with s<ver«J human 
endo««no«s vir«i genomes (rwf. 22 and M. Mama^ penoniui 
commuoAcatioo ) in aon-unnt«At conditions { • 55 *C). Th«s« 
dju aisd mofphoioftc4J and tmmuno4ofical di«airail4ritMs''^ 
b€tww<a LAV aad Um fm.V*i/-tl pair all point to LAV h*in% 
a (Mv«l cUsa o^* huoua retrovirus. 

la oooduaiofi, v« h4v« aol«culArly doned th» ao«ip4«t« 
gcno«M of LAV rrooi frrs^y infeatd acxivattd T oriU of a 
he«iciiy docioe. Ii h«c bt«ti ttewn (hat (h< eropiaa of cnuifl 
rctraviniact rcsidaa ta Um LTR^'^' aad thai s«qu«AOt difltreiMts 
and ittMTtio«a/d<icuoiu am prtMm ia tiM LTRj of 
l«iUaa«oftak aad aoa*l«utat»of«aj< rttrovinaas. b ii thta 
pOMibU that LAV aad LAV-tifca vtruaaa paaaaf«d tluwiffe 
-sod T-tnaafomad ctii linaa**'^" nifM have ua daff oa a torn 
aaeaaatioQ. AHhoufft the cONA dotm were aurtt (torn a 
LAV.produoni l^oeU line, tiM feaoflik doaei vert iaoUtad 
from LAV-infeaed aormaJ T odia. Thus, the dotiet w p r ase a t 
LAV g enomes that have aot beea seiected of adapted to a 
partioiUr oeii line. However, ttm LAV geitome is shown to be 
poJyiaorphk even withja a sia^k isoUu aad iadepeodem iso- 
lacea wtil prohaMy differ widely. 

Tlie avaiiabdiiy of doMd LAV ONA «ho«ld facUiuu the 
undantiaiiiai of tiM •olafMlir ■airtiiaif of virmi ripiicaboa, 
aad (he iropmm of (te ntm, Tte O N A n^wwca of LAV < 
up tiM pwiMiicy af opiwiaf tiM n 
pradMB aad of emdy^ tte MiaeaUf bMM of LA V I 
idty. 

We thaak On a Otmam aad i. Weiaunbach for their 
iacarMi ia dMi Dtmm Osauid, Sophie CHaMffC aad 
JMaqMliM Onm r<ar od^ tk K C Gaiio foe tha HTLV.tl 
pfoha (piMU Or hi BfiMc $m a dOMd Visaa proha (A I0») 
aad Am Cm iw cyfiAc Um MMcripL Tlua woft vM ivppor- 
ud by friM ftw cha CNU, AMOoacioa posir k Itac^ardM 
ooatR la CaMV. dM Fnarfwiw potf U Racharcha h«4dieik 
and (a 
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Molecular cloaiag of 
AlDS^assoctftted rctrovinis 

Paal A. Ladw, S<«*e« J. Patter, Kaihelya Sterner 
4 OiM Oiaa 
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UMv«««y ef CahfevM Sm Ftvaaaoo, Caiiromu ^UJ. USA 



Jay A. Uvy 




^ J ef Albs pertiau aad Cm ^fWao Mtk 
, Xhtm Mlar aaet j iM f ated Wrwa are iy«- 
y,.„...^„ .triatad ^ta <LAV)'. hMaa ly«- 
phipepk (HTLV-lliy" aad AID S aa wci a ced reerrhraa 

<AtV*2r. Wa haea Kiirtiii • tNA fewMae W -9 kiMaaaa 
(khi f«rfaai ^«<«ed rr«M tte cateie wdhM ef a ha«aa T-eriJ 
taaaar Uaa Mamd «tih AtV-X a cONA praha aade (reai fiial 
INA iilirtii dndar ONA aiilicalM aad preHraJ farM ia 
MaMid arilL We pnpafad a Khrary af Maoad atU ON A. taeaM- 
Mmi p^ Madad (has «M a « J-hh piaeM ONA aad «M 
ONA pMBMd «tt la (te #ifle CatU iiM. C iapiritaa 

AAV MMi f MM diffafaat AJ08 p idMM maaiad 




HUT-Tt «a<k, ohgiaattac from a haaao T-oe<t lynphoid 
\ «a«a aaad te peapacate tha AltV.2 stfiia of vinu\ To 
tiM vM fnoM, RNA wai extnoad froci pMriiad 
id ripai'uphwMid oa i^ama $tU nmMin$ methyl 
hydralid•^ A dteiM -t-Hh UNA spacMt waa 
(f^ I) «^ck aMlkr hMamiaaaiw UNA aad loaM 
UNA spaeiaa. TIm Mh RNA ipacMi wu uaed u a 
ithnadoMpriaMniaarevafMtrwchpuaa raacuoo 

^ a ipaciic eONA p<a6a\ UNA of virus ohtained 

ffoai oaUi tafaoad wtth AJtV.3 or wnh two addttionai isolates. 
ARVO aad AJtV-4, iheae d dkuaei beads at 9 kh that hyhndUcd 
with the cONA prahe {f^ 1), 

With thM cONA pra^ «a cxaauaed the ttnicture of nral 
ONA ia tafaoad hy difiina with reatnctjOA enxymcs. 
ilaufnpharaak ia afaiaaa ftii aad Soatheni blo<tin|. No 
tpecUk haadi wan daMCUd ta aa««raJ dt«eAs of ONA from 
uninfeoad ae«to (Fi«. ia. taaaa C £h whereas bands were teen 
in iafecud oella <ri$. 3a. Uae A). Uadife«ed ONA ftam mfec- 
ted <e41s coACaiaed a spades ai 5.5 kb, a rami tpeocs 41 6 kb 
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HTV/HTLV gene 
nomenclature 

Sir — ^Thc complexities of the genomes of 
human retroviruses (the human T-cell 
leukaemia viruses, HTLV-I and HTLV- 
II, and the AlDS-causing human im- 
munodeficiency viruses, HIV-1 and HIV- 
2) are being unravelled at a rapid pace 
which is likely to continue and expand. In 
addition to containing a large ensemble of 
positive and negative regulatory genes 
that orchestrate virus expression, these 
viruses arc also remarkable in that they 
seem to have converged onto parallel 
regulatory pathways. Two of the regula- 
tory genes of the immunodeficiency 
viruses are analogous to the two regula- 
tory genes of the leukaemia viruses, 
although their detailed mechanisms of 
action may be quite different. Decipher- 
ing the modes of action of the regulatory 
genes of these viruses is crudaJ to the 
understanding of their pathogenesis as 
well as to development of therapeutic 
agents. Because of the tremendous acti- 
vity in this field, more than one name has 
sometimes been given to a single gene and 
the same name may also apply to more 
than one gene. In the interest of the many 
new investigators entering the field for the 
first time, we feel it is important that we 
reach a standard nomenclature for all 
known genes of HIV and HTLV. We 
propose the scheme outlined in the table. 

Robert Gallo 
Flossie Wong-Staal 
National Cancer Institute, NIH, 
Bethesda, Maryland 20S92, USA 

Luc Mo^^^AGNlER 
Department of Virology , 
Institut Pasteur, 
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Dana-Farber Cancer Institute, 
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MrrsuAKi Yoshioa 
Department of Virai Oncology ^ 
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Proposed name 
(and derivation) 



Previous names 



Molecular Known function 
mass (X 10-^) 



HTLV.I and HTLV-U genes: 

ior, (transactivator) 
tax, 

rcz, (regulator of expression 
rccj virion proteins) 



tai (transactivator) 

rev (regulator of exprcssioo 

of virion proteins) 
vif (virion infectivity factor) 

/u/ (negative factor) 

vpx {X) (only in HrV-2 and 
SIV) 



x-lor, p40x, tai^ 41,41,42 Transactivator of ail virai 

ra/j, TA 38 proteins 

pp27x, tei T! Regulates expression 

25 of virion proteins 



tat'S^ TA 14 Transactivator of all 

viral proteins 
an, tn 19,20 Regulates exprcssioo 

of virion proteins 

sor^A^F'^Q 23 Determines vinis 

infectivity 

R ? Unknown 

J'orf, B, F 27 Reduces virus express- 

ion. GTP- binding 
X 16, 14 Unknown 



LTH 



pd 



HTLV-MI 




LTR 



LTR 



gag 



HlV-1 



I- tat -^^B 



vpr 



LTR 



nef 



LTR 



HIV-2 




LTR 



Vpr and vpx are temporary names and may be changed when more information about their 
functions is available. Subsoipts 1 and 2 would be used to distinguish genes of HIV- 1 and HlV-2 
(for example, rev, and revj. It is expected that genes of the simian viruses (STLV-I, SIV) would 
follow similar nomenclature with the subscripts STLV or SIV as appropriate. 



i 



Estimating the incubation period for AIDS patients 



Sir — The nonparametric analyses of the 
data on transfusion-related AIDS con- 
sidered by Medley et al,^ indicate prob- 
lems of identifiability. With data obtained 
by retrospective determination of the time 
of infection for diagnosed AIDS cases, it is 
only possible to estimate the early part ol 
the incubation distribution up to a con- 
stant of proportionality. The same applies 
to the total number of infections by blood 
transfusion before any given time. The 
transfusion data themselves arc unable to 
discriminate between high infection rates 
coupled with long incubation times on the 



As do Medley et al,\ we postulate a 
function h{x) which specifics the increase 
over time of the number of HIV-infected 
individuals who eventually develop 
AIDS, and a probability density function 
f{s) for the incubation time of those indi- 
viduals. The corresponding likelihood 
function can be maximized jointly with 
respect to h and /. As the likelihood 
depends only on the product of A and it 
is not possible to estimate either of these 
fuctions completely; they may be indivi- 
dually estimated only up to constants of 
proportionality c and c'\ respectively. 



nosed within t years of infection, F{t) - 
H /(u)d«, are given in the figure for the 
three age groups considered by Medley rt 
ai, . In this figure we show the estimates of 
F{t) so that for each group, c » F[l,5), For 
the children, the levelling of the estimate 
of F{t) by about 3.5 years suggests that the 
whole of the distribution of incubation 
times has been seen; it may then be rea- 
sonable to suppose that c « 1 but, as also 
noted by Medley a/., a second wave of 
incubation times that e::ceed 7.5 years is 
not exchided by these dau. For the other 
two age groups, there is nothing in the 
transfusion data themselves to suggest i 
value for c. As a consequence, u b impos- 
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Micrographs of an icosahedral 
'flower' obtained by solidification 
of an Al-Li-Cu alloy acre gen- 
erated by Professor Gumier on an 
image processor staning from a 
scanning electron micrograph and 
usmg pseudo-colours. See News 
arKl Views p. 640. 
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Genome organization and transactivation of the 
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Analysis of the nucleotide sequence of the human retrovirus associated with AIDS in West Africa, HIV-2, shows that it 
is evolutionarily distant from the previously characterized HIV-L V/e suggest that these viruses existed long before the 
current AIDS epidemics. Their biological properties are conserved in spite of limited sequence homology; this may help the 
determination of the structure-function relationships of the different viral elements. 



The acquired immune deficiency syndrome (AIDS) has now 
spread worldwide and appears to be an acute public health 
problem in Africa in pamcula^^*^ A retrovirus designated 
human immunodeficiency virus (HIV), but previously known 
as LAV. HTLV-III or ARV, was shown to cause AIDS in the 
different areas affliaed by the epidcmics*"^ Indeed, isolates 
from Nonh America, Western Europe and Central Africa have 
the same biological properties, and antigcnically cross-reaaive 
proteins with the same relative molecular mass^'". Only studies 
at the molecular level have revealed some differences in the 
nucleotide sequence of Nonh-American and African iso- 
latcs'^ '^ This sequence variation is also present, though to a 
lesser extent, among different isolates from the USA'"*'^*. 

The western pan of Africa seemed relatively spared by AIDS\ 
Recently, however, several typical cases were found in a survey 
of patients from Guinea Bissau and other countries of West 
Africa""^*. Unexpectedly, most of these patients did not have 
dcieaable titrcs of antibodies against HIV. But they were found 
to be infected by a retrovirus related to HIV by its ultrastnictural 
and biological properties, such as cytopathogenicity and tropism 
for ceils carrying the CD4(T4) antigen^'. Antibodies raised 
against HIV could immunoprccipitatc the gag and pal products 
of these isolates, which have molecular masses that are similar 
but not identical to these antigens of HIV; in contrast, the env 
products could not be tmmunoprecipitated, whereas previous 
HIV isolates showed wide cross-antigcniciiy of the envelope 
glycoprotein. Furthermore, the genome of this new reiroviriis 
cross-hybridized only poorly in very low stringency conditions 
with HIV DNA probcs*'*^^. We have therefore designated this 
West African AIDS virus as HIV type 2 (HIV-l referring to the 
AIDS retrovirus previously identified in Central Africa, North 
America and Europe). More than 20 isolates have so far been 
made from patients with AIDS and related conditions, mainly 
originating from west Africa"*^\ but also in some Europeans 
(L.M.. unpublished), and epidemiological studies in progress 
indicate a seroprcvaience of 1-2% in some populations of West 
Africa (F. Brun-Vczinet, personal communication). 

HIV-2 appears to be closely related to the simian immuno- 
deficiency viruses (SIV) a group of cytopathic retroviruses whose 
prototype, STLV-3„ac. was identified in captive rhesus monkeys 
(Macaca mulatto) with an AIDS-like disease^, and was later 
found to infect other primate species, either wild or in cap- 
tivity^*"". Genetic comparisons of SIV, HlV-l and HIV-2 may 
help to elucidate the phylogeny of these viruses and the origins 
of the recent AIDS epidemics. As these retroviruses share most 
of their biological properties, the identification of conserved 
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sequences is important to localize the functional domains of the 
viral proteins and regulating elements, and design new diagnos- 
tic and therapeutic tools. We present here the complete nucleo- 
tide sequence of HIV-2. the comparison of its proteins witr 
those of HIV- 1, and preliminary studies on the regulation o. 
HIV.2 expression. 

Nucleotide sequence and LTR analysis 

The sequence presented in Fig. 2 is derived from two A clones 
corresponding to integrated proviral DNA from the ROD isolate 
of HIV-2 (ref. 22), obtained in 1985 from an AIDS parient from 
Cape Verde Islands (offshore Senegal, refs 19. 20). The genome 
of HIV-2 is 9,671 nucleotides long (in its RNA form), whereas 
HIV-1 isolates are about 9,200 nucleotides long. This difference 
is panly explained by the respeaive sizes of the long terminal 
repeats (LTRs, see below). 

The genetic organization of HIV-2 (shown in Fig. 1) is 
analogous to that of HIV-1, that is: 

5'LTR- gag- po/-ccntral rcgion-enu-orf F-3'LTR. 

The 'central region*, also identified in the ovine Icntivinis visna* . 
contains five major open reading frames (ORFs). four being 
clearly related to the ORFs of HIV-l that encode the Q (or son, 
R, tat and art (or trs) genes of HIV-l (refs 15-18. 27-31). The 
fifth, which we designate ORFX, has no obvious counterpan 
in HIV-l. Alignments of the nucleotide sequences of HIV-1 and 
2 show their distant homology (from ^60% for the more conser- 
ved gag and pol genes, to 30-40% for the other viral genes and 
LTRs). To allow these alignments to be made many insertions 
and deletions must be introduced into the sequences. We do 
not find that these insertions are the small duplications that 
would be charaaeristic of the recent divergence of retroviral 
sequences, as was noted among isolates of HIV-l (ref. 12). 

The limits of the LTRs and of their internal U3, R and U5 
elements, determined by sequence analysis and some com- 
plementary experiments, arc shown in Fig. 2. Classically bound- 
ing the retroviral LTRs are short inverted repeats 
(5' C TG - CA G 3 ) located after a polypurine traa for the 3'LTR, 
and before a sequence complemenury to the 3' end of a transfer 
RNA that is used as primer by the reverse transcriptase (here, 
as in HIV-l and visna virus, a lysine iRNA, refs 15, 27) for the 
5' LTR. The R-U5 junction, corresponding to the 3' end of the 
polyadenyiaied viral RNA, was previously localized by sequenc- 
ing oligo(dT)-primed complementary DNA (cDNA) derived 
from the HIV-2roo genome^. The length of U5 + R, and hence 
the position of the U3-R junaion corresponding to the 5' cap 
site of the viral RNA were deduced from the size of a HIV-2 
cDNA synthesized using the endogenous reverse transcriptase 
aaivity and the endogenous tRNA*'^ primer (see Fig, 3). This 



rR£ VOL. 326 16 APftlL 19ST 



-ARTICLES- 



«6J 



HIV.2 



•rt.l 



•rt.2 




0 fR 



env 



tat.2 



9.7 Kb 



9 9.3 Kd 



H»V,1 




lip I iiii'in ill 



Rf. 1 Organization of the HIV-2 and HIV-l genome (BRU isolate, ref, 15). Venical bars represent the stop codons in the 3 reading frames. 
.\now5 indicate the initiator AUG codons in viral genes or potential genes. Tat I and 2, art I and 2 are the open reading frames containing 

the coding exons of the tat and art genes. 



rong.stop cDNA" is 302 nucleotides long (181 nucieo- 
•dc$ in HIV-l, rcf. 15). Thus, the U5 element is 125 bp long, 
3 is 556 bp and R 173 bp (respcaivcly 82, 456 and 97 bp in 
IV-l). All the elements of the HIV-2 LTRs are larger than in 
rV-l, and alignment by computer programs shows large tnser- 
Eons and very distant overall homology for the aligned regions", 
powcvcr. the three Spl binding sites identified in HIV-l (rcf. 
2), arc also prescni in HIV-2 from nucleotide 9.419 to 9,448 
17 out of 29 nucleotides homologous to this region of 
rV-l. The core enhancers identified in HIV-l (ref.33) are 
resent in HIV-2 from nucleotide 9,389 to 9.416: the first is 50% 
omologous and the second 100% homologous to that in HIV-l 
Fig. 2). 

The analysis of the virus-specific poiy(A)* RNA (not shown) 
rom a ceil line infeaed with and continuously producing HIV-2 
vealed a pattern of transcription reminiscent of that observed 
HIV-l-infeacd cells: RNA of over 9 kilobases (kb), corre- 
nding to a full-length transcript, and three types of spliced 
csacnger RNA of 5, 4.5 and 2 kb, also observed in HIV-l (refs 

8,^4). ■ : . - 

gag and pal proteins and HIV phytogeny 

c precursor of HIV-2 haa « calculated relative molecular 
ass of 57,100 (M,57,1K), consistent with the p55 antigen** 
en by immunoprecipiution with patient sera, and is probably 
recessed, by analogy with HIV-l, into the proteins designated 
16, p26 and pl2 (refs 19, 20). By analogy with the pl8*^ of 
IV* I, pl6 would be at the amino tenninus of gag and precede 
26, whose amino terminus has heen sequenced (H. Marquardt, 
~onal communication) and starts with the proline residue at 
ition 951. The carboxy-terminal pan of the gag precursor 
codes a pl2 that contains the cysteine-rich consensus of the 
troviral nucieic-acid-binding proteins also found twice in the 
n'** of HIV-l (rcf. 15). The HrV-2 po( ORF could encode the 
" and p36 antigens of HIV-2 (rcf, 20) which by analogy 
rrespond to the p68 and p34 (reverse transcriptase and 
ndonuclease, respectively*^) of HIV-l, • . ; ^ - 
The gag and poi proteins of HIV-l and 2 were expected to 
hare large conserved domains, as these HIV-2 proteins can be 
rccipiuted by antibodies in serm from patients infeaed with 
ly-l. However, we found that only 58% and 59.4% of the 
"no acids of gag and pol respectively are identical to the 




corresponding HIV- 1 products (Tabic la), whereas the more 
distant isolates of HIV-l (Zairian and US) show 90 to 95% 
araino-acid identity in these proteins (Table 16 and ref. 12). 
Several insertions and deletions have to be introduced in the' 
alignments (dau not shown), whereas they are rare in the 
comparisons of gag and poi genes between HIV- 1 isolates. The 
gag and pol proteins of HIV-2 are no closer to those of the 
Zairian isolates than to the prototype HIV-l (BRU isolate), 
isolated in 1983 from a French patient* probably infected in the 
USA. Overall, the difference in gag and poi between HIV-l and 
HIV-2 is of the same order as that observed among the group 
of the human T-cell leukaemia viruses (HTLV-I and II) and 
bovine leukaemia virus (BLV), However, this latter group dis- 
plays a higher conservation in the envelope, 70% amino-acid 
identity between HTLV-1 and HTLV-II, versus about 42% 
between HIV-l and HIV-2 (see below). Alignments of difiercnt 
retroviral poi proteins (Table 16) confirm that the HIVs form a 
subgroup that is more related to the Antiviruses visna and equine 
infectious anaemia virus (EIAV) than to any other hunun or 
animal retrovirus. 

Homologous domains in env ^ 

The envelope glycoproteins of retroviruses are translated from : 
a subgenomic viral mRNA (here probably the transcript of ' 
4.5 kb). Addition of sugar residues (N-linked glycosylation) 
gives rise to a high-M, precursor which is processed by pro- 
teolytic cleavage. The length of the leader sequence of the HIV-2 
glycoprotein cannot be precisely determined by alignment with - 
that of HI V- 1 (experimentally found to be 32 amino acids long^) * 
because of a lack of sequence homology (Fig. 4). But the amino 
terminus of env contaitu a relatively hydrophobic stretch in the 
calculated hydropathy plot (not shown) that is probably the . 
signal peptide. The potential cleavage site between the external 
envelope glycoprotein (120K) and the transmembrane protein 
(previously thought to be the 36K antigen^^ and now putatively 
identified as a 40K antigen^ is found at amino acid 505 (Rg. 4) 
immediately after the Lys-Glu-Lys-Arg sequence. This cleavage 
site aligns partly to one (Lys-Ala-Lys-Arg) of the two potential 
cleavage sites found in HIV-l (the other being located after the 
Arg-Glu-Lys-Arg stretch). The calculated of the extracellular 
glycoprotein (EGP) and of the ^transmembrane protein (TMP) 
of HTV-2 would be 57K and 41.7K respectively; the discrepancy 
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Fig. 3 H(V-2 scrong-scop cDNA corresponding to ihe length of 
the R-U5 eicmenis of the HiV-Z LTR. The methods were pre- 
viously described". Briefly, virions were purified by uiiracentnfu- 
gallon and an endogenous cDNA reaction performed with radio- 
labelled nucleotides after mild disruption of viral envelope with 
Triton X-tOO. The tRNA'^* primer, complementary to the PBS site 
flanking the U5 element at the 5' end of the genome, was then 
degraded by alkaline hydrolysis and the cDN A run on a denatunng 
6% acrylamidc-urea gel together with a sequence reaaion for 
accurate estimation of the size of the products. 

with ihe apparent of the EGP is explained by glycosyiation 
( 30 sites in HIV-2, about half of which are conserved with respect 
to HiV-l). 

Figure 4 shows an alignment of the envelopes of the two 
HIVs. The proteins are overall very distantly related (41.7% 
identity in the entire envelope, 39.4% in the EGP. 44.8% in the 
IMP) compared to divergent isolates of HlV-i (about 75-80% 
identity in the whole envelope, rcf. 12). Many large insertions 
have to be introduced, particularly in alignment of the £GPs 
where only shon, widely separated domains are conserved 
between HIV- 1 and 2. These domains are clustered into the 
conserved regions of the EGP of HIV- 1 (identified by com- 
parison of different isolates**"''*), and generally coincide with 
cysteine residues. Among the HIV- 1 isolates, all the cysteine 
residues could be aligned in spite of the generally large genetic 
variation, especially in gpllO. Almost al) (22/23) of the cysteine 
residues of HIV- 1 can also be aligned with HIV-2, but the latter 
contains seven additional cysteine residues, often in the regions 
representing insertions relative to HIV- 1. Thus, the folding of 
the HIV-2 EGP could be different from that of HIV-l,and some 
regions, therefore, might be exposed in a different manner. 

Other viral proteins 

The HIV-1 genome contains several other genes encoding pro- 
teins of small M, (10 to 27K), two of which (tat and art/ trs) 
have an identified funaion: the positive regulation of viral 
expression"**''-*'. No role has yet been identified for the p23 
encoded by ORFQ (or 5or)^^*^', nor for the p27 encoded by 
ORF F (or y ORF)^'. We also observed in the region between 
the poi and env genes of H IV- 1 (central region ) another potential 
gene, which we designated R (rcf. 12). All these elements are 
found in HIV.2, but the corresponding proteins are only dis- 
tantly homologous (see Table la). In the F protein, most of the 
difference between HIV-1 and 2 is due to a large insertion in 



the amino terminus of HIV. 2. The second half of the protein, 
encoded by the U3 clement of the LTR. shows better conse^^a- 
tion (data not shown). 

Based upon sequence homologies with HIV.i, ihe tat and an 
genes of H I V.2 are probably organized as split genes transcnbed 



into -3 kb mRNA made of three cxons 



the 5' !eader. 



rirst coding exon located in the central region and probao 
ending at a possible splice donor found at position 6.1- 
(C.AAGT, Fig. 2), and a last exon probably starting at the splic: 
acceptor at position 3,307 in HIV.2 f CAG ATC). The lat protein 
ol' HIV.2 would be longer than that of HIV^l ( |30 versus io 
amino acids), having two large insenions in the amino terminus 
and in the second coding exon (Fig. 4). The main domain of* 
homology of the tat proteins corresponds to a region very nch 
in cystcme residues whose structure is reminiscent of that of the 
'cysteine fingers' of some transcription-regulating elements thai 
interact with nucleic acids, such as the TFIIIA factor"^. This 
region is followed by an Arg-Lys-rich stretch that could also 
interact with DNA or RNA. No significant homology is seen 
in the second coding exon. which has been shown to be dispen- 
sable to the function of the protein** The arr-cncoded protein 
is shoncr in HIV.2 than it is in HIV.l {100 versus 116 ammo 
acids), and most of its length is encoded by the last exon. Tr.t 
most homologous part is located in a stretch of basic residue 
that may be able to interact with nucleic acids. 

Cross-transactivation of HIV-1 and HIV-2 

The trans-activator gene itat) has been shown to be indispen- 
sable for the replication and cytopathicity of HIV-1 (rcf. 41). 

Tahl« I Quantification of the homologies among retroviral proteins 
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HlV-l 

LAV-Eti 

LAV-Mal 

ElAV 

VISNA 

HTLV-I 

HTLV-II 

BLV 

RSV 



HIV.2 

59.1 
(9<5.4) 

61.6 
196.1) 

59 
(95.2) 

43.8 

(92) 

43.7 
(88.7) 

34,8 
(70.5) 

ND 

ND 

35.9 
(72.3) 



HlV.t 



94 

(98.7) 
92 

(98.7) 
41.9 

(91.5) 
42.2 
(94) 
33,3 

(70.3) 
ND 

ND 

34.5 
(76.2) 



HTLV.I 
ND 

ND 

ND 

ND 

ND 



62.8 
(99.5) 

49.5 
(93.2) 

38.2 
(86.4) 



VISNA 
ND 

ND 

ND 

46.7 
(90.8) 

ND 
ND 
ND 
ND 



The reference protein of each alignment is that listed at the top o: 
the column. Proteins were aligned using the NUCALN program*' with 
following parameters; K-tuple I, window 20. gap penalty I. Two results 
are indicated in each case: the amino-acid identity (%) in the aligned 
domains (that is, excluding the regions of insertion/ deletion), and 
between parentheses the perccnuge of the length of reference protein 
that could be aligned. «, Homologies between HIV- 1 and HI V-2 proteins. 
For env. the calculation was done for the external glycoprotein (EGP. 
including the signal peptide, whose length is not exaaty known in 
HI V.2), and the transmembrane protein (TMP). A. Comparison of the 
po/-encoded proteins of different retroviruses. LAV.MaJ and LAV. Eli 
arc Zairian isolates of HIV- 1 (ref. 12); EIAV: equine infectious anaemia 
virus (sequence communicated by Dr S. Aaronson). and visna virus* 
are animal lentivinises; HTLV-l, HTLV-ll. BLV*^**^. related (cukac- 
mogenic retroviruses; RSV) Rous sarcoma virus**. ND. not determined. 



IV. I 



si9n«i AWP ^ 



^jtHtst : r.^aiu ^^ vcTPt ymyroi ciuTrrisvrMssiciWMac 



:tc*; t>wi.nTj :»pc»aTfiWA«ants$TcjiiiTTi«TsrmT-nxt-ctiJtsTPC*ioiiC<i5titT:«cart«ffa.ttflti«Q 

--CJfiil mitCrrqUTtffTllBtl 14 J 

>* 0 

^ -.s^orr^T :•. 7j^ Tivi:^* ^ »(¥srty tpiwt^ftf u tit ^i<;»Tfwc 7g.-- ^ m3TVQ ; :iiCitftysT^ ^ j44 

0 0 O O t O . 

«c:ukitt — »rT:TW>ci3ii.iT: :-iLii»TTfit4tHc«i«(««Tv^ EMuis.-a«*n<iHTCF:NtiMQ4*#o#«c-«[au«jivirru»ft j4J 
• • » *• • ««•• • « t 

-cjuitcvxt is«<»rr-- - ;«AiTt :<'.«3sv{- . ciKTtw*Hrrtu[ttciCR:iAmi;stCM..«QA<«iitA<w««T--'L«]Uas,i« it 4 



O s 

CMinitSTAKM txotii 



o o EGPjr TMP 

: «ii[9UQ»««cTttiTnACv«tLnt-..CLC8T<Lvt::Ttcf*fTfTffT3f«pc<trrtorrvLeru-^rLjkrAa«KM^-^TVU4UT ^2 



a f T »AitJU . .V wcinjA VC ( . G4Lf 1 



O 

vncAflis 4JI 



--^Lnftcnvtcici ;40 

i;CtS3iOtS[>L««CJU-L:wOOLIj^:'.rSTHlL-.*-IOLL.:VTI[vCL;.C<-IICWtAUrVWIilLqTVSqCUUA«lLUUrAU«4<CTM iil 

O • O 

^tcv-'oci^** i«iti«i:itx:.£taL s»: 



Fig. 4 Alignments of:hc HIV-l f BRU isolate, 
ref. 15) and HIV.2 proteins. Aiierisks indicate 
amino-acid identities. Gaps were introduced to 
optimize the alignments. In the envelopes, the 
potential cleavage sites are shown by arrows. 
EGP. exicmai glycoprotein; TMP. transmem- 
brane protein. O, Potential iV-glycosylation 
sites; cysteines. The domains of the EGP of 
HIV-l that were found to be well-conserved 
among isolates^* are underlined. The pans of 
tat and art encoded by each of the two exons 
are separated by an arrow. 
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To examine whether transaaivation (a property also shared with 
the ovine visna lentivirus but not with the related caprine arthritis 
and encephalitis virus*^) exists in HIV-2» we construacd a 
plasmid. called pHIV2-CAT, containing the bacterial chloram- 
phenicol acetyltransferase (CAT) gene under the control of the 
.U3-R region of HIV.2 (225 bp of U3 and 175 bp of R). To test 
the transaaivation of H1V*2, cells were either infected with 
HrV-2 or mock-infeaed, and five days later transfected with 
either pSVC AT (which contains the CAT gene under control of 
the SV40 early promoter*^) or pHIV2-CAT. At the time of 
transfection, the cells were not producing virus. Nonetheless, 
we observed a substantial increase in the amount of CAT 
expression in extracts of HIV-2-infeaed vcaus mock-infected 
cells that had been traxisfeaed with pHIV2-CAT (Fig. 5a). The 
expression of the SV40 early promoter was not aficctcd by HlV-2 
infection. 

To determine whether the (at gene of HIV-l could transacti- 
vate the LTR of HIV-2 and vice versa, we cotransfeaed SW480 
cells** with subgcnomic fragments of HIV-l or HIV-2 and 
pHIV2-CAT or a plasmid called pHIVl-CAT, which contains 
U3-R of HIV-l (the entire U3 and 70 bp of R) directing tran- 
scription of the CAT gene. The plasmid pLET (a gift from Dr 
S. Wain-Hobson) contains the region of the HIV-l shown by 
others to encode the HIV-l tat genc**-^. The plasmid pME2l4, 
on the other hand, contains HIV-2 sequences between nucleo- 
tides 5.786 and 8,571 (Fig. 2), and in particular conuins the 
open reading frames of HIV-2 that share homology with the tat 
gene- of HIV-l. {n both of these plasmids transcription is driven 
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by the LTR of the respective virus, and the first AUG of the 
transcript is the fint AUG of the puutive tat gene. It should 
be noted that both these plasmids also contain the coding 
potential for the art gene. 

Akhough the SV40 early promoter was not affected by either 
the HIV-l tat nor the HIV-2 tat genes, both HIV-l and HIV.2 
LTRs were substantially activated by the HIV-l tat gene 
(Fig. 56). This is perhaps surprising in view of the difference in 
size of the R region of HIV-l (where the transactivator respon- 
sive region (TAR) resides*') and HIV-2. However 35 of the 58 
bases present in the first stem-and-loop secondary structure of 
the^AR region of HIV-l are conserved, and an analogous 
stem^and-loop struaure with the first 77 bases of R can be drawn 
for HIV-2 (ref. 33). 

The HIY-2 LTR is transactivated over 100-fold by pME2l4 
(Ftf.56). On -the other hand^ the HIV-l LTR is not as well 
transactivated by this plasmid (-5-20 fold, Fig. 5 and other 
data not shown). Similar results were obtained aAer transfection 
of UeLa and HUT 78 cells (dau not shown). These experiments 
indicate that pME214 encodes a functional tat gene. In addition, 
they indicate that the specificity of the HIV-2 tat is somewhat 
difierent from that of the HIV-l tat It will be important to 
determine whether this observation is isolate-specific. 

Orj^in of hamaa {mmuiiodeficieacy viruses 

We have presented here the complete nucleotide sequence of 
the tetrovirus associated with AIDS in West Africa, HIV-2, and 
tentatively ideatified the viral proteins either deteaed in 
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Fig. 5 Transactivaiion of HIV-l Chloramphtnicoi 
accivUransfcrasc iCAT) assays were done as dcscribcd■^ The 
unrcacted chloramphenicol is marked 'CAM', and the acctylaccd 
produas arc marked *AcCAM'. AJ1 reactions were 1 h wuh lO^^o 
of the cellular extract made 40 h after transrcction. The ongin of 
the promoter linked to the CAT gene is indicated above each lane. 
^ SV40 indicates the SV40 early promoter. H t V-: indicates the partial 
U3 and the cniirc R sequences of HlV-2 ( ROD isolate), and HIV.l 
indicates the entire U3 and 70 bp of R of HfV-l ( BRU isolaict. a. 
HUT 78 cells were cither mock-infected ( UN. uninfected) or infec- 
ted (IN) with HIV-2. Five days post-infection. 3x 10* cells were 
transfcctcd with 3 ^.g of plasmid in 0.5 ml of Tris-salinc without 
divalent cations for 45 min at 37 "C with 250 ml"' DEAE- 
Dcxtran. 6* -t ^ 10* SW480 cells were cotransfccied by the CaCK 
technique*^ with 3 ng of promotcr-CAT plasmid and 3 p-g of the 
indicated plasmid. Salmon sperm DNA was added ?uch that each 
transfcction was 20p.gmr' DNA. This c.^pcnmcnt was repeated 
;hrcc times with similar results. 

immunoprccipitations with patients" sera, or homologous to 
proteins previously identified in HIV. I. The tuo viruses share 
a similar genomic organization, indicating a conunon evolution- 
ary origin, but differ significantly in terms of nucleotide and 
amino-acid sequence: the more-conscr^*ed gag and pol genes 
respectively display only 56 and 60% nucleotide sequence 
hon^ology and both less than 60% amino-acid identity. The 
calculation of the nucleotide sequence homology for the other 
genes gives even lower values, malcing HlV-l and 2 42% 
homologous overall. This confirms that these two viruses arc 
distinct elements of the HIV family, and cannot be considered 
as strains of the same virus, according to the recommendations 
of the international taxonomy committee"^. 

It was previously established that HIV.2 is more related to 
the simian immunodeficiency viruses (SIV) than it is to HIV-1. 
The gag, poi and env proteins of SIV and HlV-2 are aniigenically 
cross-reactive, whereas their cross-reactivity to HIV-l is restric- 
ted to some gag and pol antigens. The amino-tcrminal amino- 
acid sequence of the major core protein (cotresponding to the 
p25**'' of HIV-l and p26'*" of HIV-2) has been determined in 
one isolate of SIV obtained from macaques with an AIDS-Iikc 
disease (MnlV, rcf. 26). Out of the 23 amino acids sequenced 
21 match with the amino tetTninus of p26'*'* of HIV-2, whereas 
13 (with one deletion) match to the p25'"' of HIV-t. Further- 
more, whereas HlV-2 can infca, at least transiently, primate 
species which arc cvolutionarily more distantly related to 
humans (at least baboons and macaques), HlV-t infects only 
humans and chimpanzees (R. Desrosiers and P. Fultz, personal 
communications). In faa, it is not possible from current data 
to know whether SIV can be classified as distina from HIV.2 
or if they only differ as independent isolates of the same virus. 

The almost simultaneous emergence of two foci of AIDS in 
distina areas of the African continent is unlikely to be due to 
the recent emergence of two novel human pathogens, for 
e;^ample by simultaneous trans-species infection by animal 
retrovirus, or by the mutation of pre-existing non-pathogenic 
human retroviruses. Indeed, HIV-l and HlV-2 arc obviously 
retroviruses with a common origin, but they are highly divergent, 
and it is more likely that their time of divergence is earlier than 
the beginning of the current epidemics. Therefore a common 
ancestor, with similar properties and pathogenic potential, prob- 



jb(> eustcd i iong time ago in a human population, and t 
emergence of the AIDS epidemics is more likely ihc result 
simultaneous modiricattons of epidemiological parameters 
v»est .ind Central Africa, such as uncontrolled .irhaniiatic 
leading io Jhc infection of larger populations. 

A ^^uestion to be addressed is why the HlVs were only recen: 
detected if :he> existed for a long period. This may be due 
ihe ;\ici thai the patho^entcity of an Hiv.{\.pe reirc-irus cann 
be re'.ealed until it has spread to a sigmncant por.ton of 
population. First. :n areas of Africa y-tih poor medical I'acilme 
■A here other infections, such as malaria, represent pnmary caus. 
of morbidity, isolated cases ot AIDS could have been 
undetectable clinical e%eni. Then, the incubation r>me can ^a: 
considerably, and it cannot stjil be ruled out that a large fractic 
of individuals infected by a HIV will remain healthy carrier 
In Kenya. HIV.l seropositivity was first reponed in a hig 
fraction of subjects at risk of AIDS (female prostitutes! wh 
were apparently healthy; later, the virus diffused to a larger par 
of the population, and cases of AIDS were observed'' . A simiL 
situation could explain the apparent tack of pathogenicity i: 
the retrovirus designated HTLV-IV, but indistmguishable fror 
HIV.2 and SIV by the antigenicity of its proteins Th 
presence of HTLV-IV was identified only in apparently health 
individuals in West Africa, an area where we have obsenev 
several typical AIDS cases caused by HIV.2. It is possible tha 
the apparent non-pathogenicity of HTLV-4 is due to a recen 
epidemic diffusion of HI V.2/ HTLV- (V in the West .Afnca. whcr-. 
AIDS cases still represent a minor fraction of the infected an., 
seropositive individuals, whereas HlV-l has diffused in majo 
cities of central Africa or the USA some time before. 

Implications for vaccines and diagnostics 

The nsk that H I VO-infected blood samples may not be detcctec 
by standard screens, currently based on the detection of anti 
HIV-l antibodies, makes it imponant thai a way of diagnosins 
HIV-: infection is found. As the envelope, and especially it> 
transmembrane pan, represents the primarv- target of the hosi 
antibody response to the HIV infection tsce ref. I), antigen? 
from the envelope of HIV.2 will significantly improve the spec- 
trum of the screening tests, allowing the detection of samples 
infected by HIV.2. and perhaps by other as yet uncharacterized 
members of the HIV family. 

As it shares most of the sir\ictuTal characteristics and biologi- 
cal propcnies of HIV-l. but displays significant genetic diver- 
gence, HIV.2 is a powerful tool in the study of the molecular 
biology of this group of retroviruses. Among the crucial biologi- 
cal properties common to both HIVs are iropism for CD4- 
positive cells, and mechanisms of positive regulation of viral 
expression encoded by viral transaaivating factors. Wc observed 
that the fat of HlV-l activates the transactivation responsive 
(TAR) sequences as efficiently in both types of HIV. whereas 
the iat gene of HlV-2 is more efficient on the TAR elements of 
HIV-2. ^The lat proteins of HIV-l and 2 have only shoa 
homologous sequences, and this will ease the dissection of their 
function by mutagenesis or using chemically synthesized pep- 
tides. 

HIV- 1 and probably HIV- 2 recognize the CD4 surface 
molecule as a receptor on helpcr/inducer T lyniphocytes and 
perhaps on other cells expressing the CD4pT0lein-*'''^ In HlV-l, 
this interaction is mediated by the external envelope gly- 
coprotein lEGP; ref. 52). and an important problem is which 
of the domainls) of this protein arc involved in that interaction. 
Indeed, blocking this step of the virus life cycle, either by 
antibodies or drugs, could be an efficient means for preventing 
infection or blocking its spread. As the receptor is a constant 
cellular protein, wc can postulate that the binding domain of 
the envelope is conserved among the CD4-tropic HIVs. The 
conserved domains of the EGPof HlV-1 and 2 are not numerous, 
and therefore it becomes possible to demonstrate their possible 
role in the virus-receptor interaction using a relatively limited 



set of sue-direcicd mutations. Given the absence of antigenic 
cross- reactiMt> of the envelopes of the two HlV's, this. CD4. 
binding domain is probably not, or only poorly, immunogenic — 
perhaps because of masking by glycosyiation. poor exposure 
on the vinon surface, or mimicicing of 'self antigens. Neverthe- 
less, its presentation to the immune system out of context of 
the vinon. ihat is. as a pcpcide, might induce a neutralizing 
antibody response that is not attained, or attained with only a 
low cfTicicncv.. wtth the complete native envelope from vinons 
or expression systems'*''*. 

Conclusion 

The comparative analysis of HIV-i and : reveals major genetic 
differences between retroviruses that share many of their biologi- 
cal propcnies. Thes both cause AIDS, are cytopathic in vitro, 
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have a tropism for CD4-beanng cells and have elements trans- 
activating the expression of viral genes acting at the LTR level. 
The evolutionarv potential of these wruses is therefor- irnkmg, 
and we must ask whether other HIVs can emerge as long as a 
favourable epidemiological situation is provided, must take 
advantage of the precise delineation of the conser- ed structures 
to understand their molecular biology and develop neu. 
therapeutic tools, especially immunoprophylactics. 
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Reccatiy, Wg oi.' Aod Hor et al? have siiowa that 
^1 j&ViCtt04_« is a supercoodactor wich a supcrcoaductiiif oaset 
teoiperttare at --92 K as dctcrmliied by their resistivity aad a^ 
SQ«ceftihility OMasurcflieats. Because the magnetic properties are 
i«portait ifl describiog the aacura of tapercooductivity, wt have 
aeasared the d.c. magoetk mooMt of this material. Here w« sliow 
tkat thk Biaterial cooled ia zero fiaid or la a high field (H^> 



90 G) IS dUmagoctic beiow T^^ % 90 fC, consistent with the previous 
.Ricasuremeots^'^. However, when the sanipie is cooied in a smali 
field (^85 G), the magnctizatioa. first becomes negative 
(diamagnetic) below bat further cooling results in a jump of 
M to a positive value at low temperature. We have also observed 
this switching by the application of an additional small field when 
the sample was cooled In a small field. 

The Yj 2BaogCu04_a sample was prepared as described in 
ref. 1. The X-ray dif!ractograms reveal that the sample has 
multiple phases, devoid of the IC^NiF* structure. From the 
elcaricai resistance measurement, the superconducting onset 
temperature is * 94.5 K and the resistance becomes 'zero' 
below To* 92 K indicating that the sample is a superconductor 
with a rather narrow transition width. A Quantum Design super- 
conducting quantum interference device (SQUID) mag- 
netometer has been employed to measure the magnetization of 
the sample as a funaion of temperature and magnetic field. 
When the sample is cooled under zero field conditions, we have 
found that M is diamagnetic below 7^ and the susceptibility 
below -25 K reaches -35% of that of perfea diamagnetism 
(-l/4ir). 

We have also measured M when the sample ts cooled in a 
fields In Fig. 1, the magnetization obtained at various 
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Three novel genes of human T-lymphotropic virus type III: Immune 
reactivity of their products with sera from acquired immune 
deficiency syn4rome patients 

(sor, tax and J' orf gen€s/cDNA cloning/double spiicirg/rVi vitro translation/ inimunoprecipitation) 

SuRESH K. Arya" and Robert C. Gallo 

Laboratory- of Tumor C;il Biology. Nauonal Cancer Insiuure. National Instiiutes of Health. Bcthcsda. MD :0:05 



Communicated by Peter C. Xowell. Sovember 13. 1985 

ABSTRACT Human T-lymphotropic virus type III or 
lymphoadenopathy associated virus (HTLV-III/LAV) is the 
cause of acquired immune deficiency syndrome (AIDS). In 
addition to the conventional retroviral genes involved in virus 
replication, namely, gag, pal, and ertv genes, DNA sequence 
analysis of HTLV-III genome predicted two additional open 
reading frames termed by us short open reading frame (sor) 
and 3' open reading frame (J' orf). Furthermore, functional 
analysis revealed another gene with transactivating function, 
termed to:. We have now structurally identified and function- 
ally characterized these HTLV-IIT specific genes by way of 
cDNA cloning. DN.A sequence analysis of the clones shows that 
the tat and 3' orf genes contain three exons and their tran- 
scription into functional mRN.A involves two splicing events 
and that the sor gene contains at least two exons. In vitro 
transcription and translation of the cloned spliced sequences 
show that the sor, to:, and 3' orf genes code for polypeptides 
with apparent mobility of 24-25 kDa, 14-15 kDa, and 26-28 
kDa, respectively. All three polypeptides are immune reactive 
and are immunogenic in the natural host. The results demon- 
strate that the three extra open reading frames of HTLV-III, 
two of which are unique to HTLV-III, are in fact genes that 
function in vivo and further allow the identification of three new 
and previously unrecognized HTLV-III antigens with differ- 
ential immunogenicity in individuals with acquired immune 
deficiency syndrome and related disorders. 



Human T-lymphotropic virus type III (HTLV-III) or the 
lymphoadenopathy associated virus (LAV) is etiologicaily 
linked to acquired immune deficiency syndrome (AIDS) and 
AIDS-related complex (ARC) (1-4). The overall genetic 
striicture of HTLV-III/LAV is similar to that of other animal 
retroviruses. However, besides gag, pol, and env genes, 
DNA sequence analysis of HTLV-III/LAV genome predict- 
ed two additional open reading frames or potential genes 
(5-8). termed by us and others sor (short open reading frame) 
and 3' orf (3' open reading frame). The presence of a third 
gene, termed tar (transactivation of transcription), was also 
suggested (9-12). Thus, HTLV-III/LAV contains coding 
potential for three genes that are specific to this virus. Two 
of these putative genes, sor and i' orf are unique to 
HTLV-III but a functional analog of the third gene, tai, is also 
carried by other members of the HTLV-bovinc leukemia 
virus (BLV) group of retroviruses (9-13). We and others have 
localized the tat gene of HTLV-III to a region between the 
putative sor and the env genes (10, 12), a region of the genome 
previously thought to be noncoding. This is distinct from the 
other members of the HTLV-BLV group where tat gene is 
located downstream from the env gene. Thus, even the tat 
gene is organized differently in HTLV-III. We report here 
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that sor, tat, and 3' orf genes all contain incronis) and are 
respectively translated into polypeptides with apparent mo- 
bility of 24-25 kDa. 14-15 kDa. and 26-28 kDa on 
NaDodSO^/PAGE. These gene products display differential 
immune reactivity for HTLV-III positive human sera, the 3' 
orf gene product being the most immune reactive. The results 
demonstrate the existence of three new HTLV-III antigens. 

MATERIALS AND METHODS 

cDNA Cloning and DN.A Sequencing. Poly(A)-selec:ed 
RNA from HTL V-III-infected H4 cells, isolated as descnbed 
(14, 15), was used to construct cDNA libraries as reponed 
(10). The libraries were screened with subgcnomic HTLV-III 
probes to obtain clones containing specillc HTLV-III se- 
quences (10, U). The selected clones w-ere characterized by 
restriction mapping and DNA sequencing by the method of 
Maxam and Gilben (16). 

In Vitro Transcription and Translation. The inserts of 
selected cDNA clones were transferred to the vector pSP6 
that transcribes insened DNA under the influence of SP6 
promoter (17), RNA was transcribed in vitro after lineariza- 
tion of the plasmid DNA with specific restriction enzymes. It 
was translated in vitro by using rabbit reticulocyte translation 
system and [^^S]methionine. and the products were analyzed 
by 12% NaDodSOa/PAGE and radioautography by the 
standard procedures, 

Immunoprecipttatton with Human Sera, The in vitro trans- 
lation products were incubated with normal human serum for 
1-2 hr at A^C. Suspension of Staphylococcus aureus (Staph 
A) cells was then added, and incubation was continued for an 
additional 1 hr. The sample was centrifuged, and the super- 
natant was divided into two equal pans, one of which was 
incubated with immune serum at 4**C for 18-24 hr. A 
suspension of Staph A was added to each sample and 
incubated at 4°C for 1 hr. The samples were centrifuged, and 
the pellets were repeatedly and sequentially washed with 50 
mM Tris-HCl (pH 7.4)/50 mM EDTA/0,05% Nonidet P- 
40/1% aproteinin containing 0.5 M NaCl or 0.15 mM NaCl. 
The pellets were suspended in 75 mM Tris-HCl (pH 6.8)/0-7 
mM 2-mercaptoethanol/2% (wt/vol) NaDodSO4/10% (vol/ 
vol) glycerol/0.001% bromophenol blue, boiled for 10 min. 
and centrifuged. The supematants were subjected to 12% 
NaDodS04/PAGE analysis. 

RESULTS 

cDNA Clones of HTLV-III Specific Genes, To identify 
HTLV-III specific genes, we took the direct approach of 
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Fig. 1. (Legend appears at the bottom of the opposite page.) 



Medical Sciences: Arya and Gaiio 

obiaining functional cDNA clones by screening cDN A librar- 
ies with specific subgenomic HTLV-III probes to obtain the 
desired clones. We have previously described a functional 
cDNA clone (clone 1) corresponding to the mRNA of the tat 
gene (10). We also described in this previous repon a second 
cDNA clone (clone 3) that we speculated may correspond to 
the mRNA of 3' or/ gene, Bothufif these clones were copies 
of the mRN As that were generated by double splicing events. 
Thus, the tat gene and the putative J' or/ gene consisted of 
three exons and two introns (Fig. 1). We have now obtained 
another cDNA clone (clone 12) that contains the complete 
open reading frame of the putative sor gene, in addition to the 
open reading frames of the tat gene and the putative 3' orf 
gene (Fig. 1). DMA sequence analysis of clone 12 [2304 base 
pairs (bp)l showed it to be an incomplete cDNA clone as it 
lacked the mRNA cap site and possibly other sequences on 
the 5' side of the sor open reading frame. However, it 
contained the 3 '-splice junction that was identical to the 
3'-spiice junction of clones 1 and 3 (Fig. 1), 

Translation Products of HTL V-in Specific Genes. To char- 
ac:jrize the gene products of the putative sor, tat, and 3' orf 
ger.es, the cDN A were transferred to the transcription vector 
pSP6. and the plasmids containing clones 1, 3, and 12 cDNA 
insens were designated pSP-i, pSP-3, and pSP-12, respec- 
tively (Fig. 2^7). RNA was transcribed after linearization of 
the plasmid DNAs with specific restriction enzymes thai 
were chosen because they will either retain a given open 
reading frame as a pan of pSP6 transcriptional unit or delete 
it. The transcription of the plasmid DNAs cleaved with 
specific restriction enzymes gave RNA transcripts of the 
appropriate sizes (data not shown). These transcripts were 
translated and products analyzed. Representative results are 
s '.own in Fig. 2. The transcripts of pSP-1 DNA linearized 
'..ith Xba I or Sma I gave two polypeptides with apparent 
mobility of 25-26 kDa and 14-15 kDa, the 14-15 kDa 
polypeptide being in smaller relative amounts. Digestion of 
this plasmid DNA with BamRl or Xho I, which deletes 3' orf 
open reading frame from the transcriptional unit, gave only 
the polypeptide with 14-15 kDa apparent mobility. These 
results suggest that 25-26 kDa and 14-15 kDa polypeptides 
were products of the 3' orf and tat open reading frames, 
respectively. While pSP-12 DNA linearized with Xba I 
displayed three polypeptides of 25-26 kDa. 23-24 kDa, and 
14-15 kDa» this DNA linearized with BamRl ox Xho I gave 
only two polypeptides of 23-24 kDa and 14-15 kDa apparent 
mobility (Fig. 2). These results again suggest that 25-26 kDa 
and 14-15 kDa polypeptides are the product of the 3' orf and 
tat open reading frames, respectively, and further suggest 
that 23-24 kDa polypeptide is the product of the sor open 
reading frame. 

The transcripts of Xba I linearized pSP-3 DNA, which 
contains only the i' orf open reading frames, though not 
always translated efficiently . displayed a distinct polypeptide 
with apparent mobility of 27-28 kDa. This polypeptide was 
not detected when pSP-1 DNA was linearized with BamUl or 
Xho I, which removes the 3' orf open reading frame from the 
transcriptional unit. These results suggest that the i' orf open 
reading frame contained in pSP-3 DNA was being translated 
into a 27-28 kDa polypeptide. The plasmid DNAs containing 
cDNA inserts in the incorrect orientation with respect to the 
SP6 promoter gave transcripts of the appropriate sizes but 
none of these transcripts were translated into distinct 
polypeptides (Fig. 2). 
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Fig. 2. (a) Physical map of pSP-1 containing HTLV-III cDNA 
clone 1. pSP-3 and pSP-12 were similarly constructed, (c) and fnc) 
refer to the correct and noncorrect orientation of the cDN.A insert 
with respect to the SP6 promoter, (b) NaDodS04/PAGE analysis of 
the translation products of the transcripts from pSP-1. pSP-12. and 
pSP-3 plasmid DNAs. Lanes 1 and 2. Xba 1- and ficmHI-digcsted 
pSP-Uc) DNA; lanes 3 and 4, Xba I- and 5amHI-digested pSP-Ifnc) 
DNA; lanes 5 and 6. Xba I- and 5amHI-digestcd pSP-12(c) DNA; 
lanes 7 and 8. Xba l- and BamHI-digcstcd pSP-12rnc) DNA; lanes 9 
and 10. Xba I- and SamHI-digested pSP.3(c) DNA; lanes U and 12. 
Xba I- and Ba/nHi-digesied pSP-3(nc) DNA; lane M, molecular size 
standards. 

Since clone 12 contained the tat open reading frame in 
addition to the sor and i' orf open reading frames, we tested 
its transactivating capacity in a transfection system that 
measures transactivation of the chloramphenicol acetyl 
transferase (CAT) gene (see ref. 10). Representative results 
for human lymphoid JM cells are shown in Fig< 3. Clearly, 
clone 12 DNA transactivated the CAJ gene activity. Thus, the 
tat open reading frame contained in clone 12 was transcribed 
and translated into a functionally active polypeptide. 

Immune Reactivity of HTLV-m Specific Gene Products. To 
evaluate the immune reactivity of the polypeptides directed 
by the sor, tat, and 3' orf open reading frames, translation 
products were immune precipitated with HTLV-III-positive 
human sera from several individuals. Representative results 
are shown in Figs. 4 and 5, and data are compiled in Table L 
HTLV-III-positive serum specifically immune precipitated a 



Fig. 1 {on opposite page), (a) Physical maps of HTLV-III cDNA clones 1. 3. and 12. The two splice junctions for clones 1 and 3 and one 
splice junction for clone 12 are indicated. The nucleotide numbering in parentheses is according to Ratncr et aL (6). {b) DNA sequence of 
HTLV-III cDNA clone 12. The three open reading frames contained in this clone along with the predicted amino acid sequences arc shown. 
DNA sequences of clones 1 and 3 have been reponed before (10). The open reading frame for the tat gene is in a different frame than those 
of the sor and 3' orf genes. 
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Ftc. 3. Enhancemem of HTLV-III LTR-promotcd CAT" gene 
expression by HTLV4II cDNA clone \2. Human lymphoid JM ceU5 
were cotransfected with clone 12 DNA in expression vecror pCV 
(pCV.12) and HTLV-III LTR-CAT (pC15-CA'n plasmid DNA by the 
DEA£-de;c:ran protocol (10). CAT gene product activity in the extract 
of transfectcd cells was measured by analyzing the conversion of 
['XlchioraiTiphcnicoi (Cm) into its aceiylated forms (AcCmj by thin 
layer paper chromatography. Lanes 1 to 6 are respectively for cells 
transfected with DNAs of pSVoCAT(l). pRSVCAT (2). pC15'CAT(3), 
pCU-CAT plus p€V-HXb3 (4), pCl5-CAT plus pCV-12 (correct 
orientation) (5), and pC15-CAT plus pCV-12 (incorrect orientation) (6). 

predominant polypeptide of 25-26 kDa for^^^r I as well as Sma 
I linearized plasmid pSP-1 DNA. The 25-26 kDa polypeptide 
was also specifically immune precipitated from the translation 
products of Xba I as well as Sma I linearized plasmid pSP-12 
DNA. Similar results were obtained with plasmid pSP-3 DNA, 
except the apparent size of this polypeptide was 27-28 kDa (Fig. 
4). The marginal detection of this polypeptide in translation 
products of ^cimHI-digested pSP-1 and pSP-3 plasmid DNAs 
was probably the result of incomplete enzyme digestions; it was 
not detected for ^ci/nHI-digested pSP-12 plasmid DNA, In- 
stead, translation products of 5amHI-digesied pSP-12 plasmid 
DNA displayed a band at 23-24 kDa that was immune precip- 
itated with HTLV-III-positive serum but also to a lesser extent 
with some normal human sera (see Table 1). Consistent with our 
interpretation of the translation products noted above, we infer 
that the 2^28 kDa and 23-24 kDa polypeptides are the immune 
reactive products of i' orf and sor open reading frames, 
respectively. Immunoprecipitation of the 14-15 kDa lai gene 
product was not obvious with this panicular HTLV-IH-positive 
serum but could be detected to varying extent by some of the 
other HTLV-III-positive sera as shown in Fig. 5 and listed in 
Table L 

DISCUSSION 

The HTLV-III open reading frames termed sor, tat, and 3' orf 
• are specific to this virus and two of these, sor and 3' orf arc 
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Fio. 4. NaDodS04/PAGE analysis of immune precipitates of 
translation products of pSP4. pSP-12, and pSP-3 DNA transcripts. 
Lanes 1 and 2, Xha I- and BcmHI-digcstcd pSP-1 DNA; lanes 3 and 
4. Xba I- and 5amHI-digcsted pSP-12 DNA: lanes 5 and 6, Xba I- and 
BamHI-digcsted pSP-3 DNA. Sublancs fa) and fb) are for HTLV. 
Ill-positive and normal human scrum, respectively. Lane M, mo- 
lecular size standards. 
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Ftc. 5. NaDodSO./PACE 
analysis of immune precipiiats^ 
of iransiation products of Xho 
I-digestcd pSP-1 (a) and p5P-i: 
DN.A (b) transcripts. Lanes I 
and 2 are for two dirfercnr im- 
mune sera and lane 3 is for nor- 
mal servm. Lane .M, molecL^ir 
size standards. Analysis vii^ 
performed as descnbed in Fig. }. 
(The 25-26-kDa band is the j' 
orf gene product, presumably 
the result of incomplete enzyme 
digestion of the plasmid DN.A.) 



unique to it. The products of the three HTLV-III specific 
open reading frames immune react with antibodies in sera of 
individuals with AIDS and ARC. Therefore, these open 
reading frames are in fact genes that are expressed m vivo. 

Our results allow the structural definition of the three 
HTLV-III specxfic genes. We have previously characterized 
the functional domain of the tai gene (10). Like the tat gene, 
the J' orf gene (clone 3) also consists of three cxons (287 bp, 
69 bp, and 1258 bp) and two introns (5268 bp and 2330 bp), 
and its transcription into a functional mRNA involves double 
splicing. The 3' orf gene differs from the tat gene in having a 
truncated second exon involving splicing out of the putative 
initiation codon of the tat gene product (10), 

The sor gene contains at least two and probably three 
exons. The 3' exon fl258 bp) of this gene (clone 12) is 
identical to the third exon of the tat and 3' orf gents. The 
sequences on the 5' side (1114 bp) of this exon in clone 12 are 
shared with the second exon of the tat gene and extend 
upstream to include sor open reading frame. We suspect that 
the generation of the sor mRNA also involves two'splicing 
events. It is possible that the synthesis of this mRNA 
involves the same first donor site (at nucleotide 287) as other 
mRNAs and one of the many potential acceptor sites located 
to the 5' side of the sor open reading frame. If the consensus 
acceptor site nearest to the 5' side of the sor open reading 
frame located at nucleotide 4494 is utilized, the functional sor 
gene will generate a message of about 2.7 kilobascs fkb). 
However, if the sor message involves only one splicing event 
demonstrated in clone 12, the mRNA would be about 7.0 kb. 

Table 1. Immune reactivity of the sor, tai, and 3' orf 

gene products ^ 
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Gene product 



Number 


Diagnosis 


sor 


tat 


3' orf 
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AIDS 
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AIDS 
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AIDS 


+ 




-r -i- 
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AIDS 
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ARC 
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ARC 






+ 4- 


7 


Healthy homosexual 
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Healthy homosexual 
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Healthy homosexual 








10 


Healthy homosexual 
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Healthy heterosexual 








12 


Healthy heterosexual 








13 


Healthy heterosexual 








14 


Healthy heterosexual 









, Reactive; = . detectable; strongly reactive; — , not detect- 
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Muesing er ai (8) have suggested that the sor gene consists 
of two exons generating a message of about 5.0 kb. Their 
suggestion is inconsistent with clone 12 that contains an 
iniron located within their suggested second exon. It is 
possible 10 postulate other combinations of potential 5' donor 
and acceptor splice sites to generate a 5-kb message involving 
double splicing. It is. of course, pos^fcle that more than one 
5pe:ies of the sor mRNA is synthesized utilizing alternative 
spt:-:ing events. We have previously reported four abundant 
mRNAs of 9-4 kb. 4.2 kb. 2.0 kb. and 1.8 kb in HTLV-III- 
ini'ected cells (10, U). We also obsen/ed other less abundant 
RNA species of about 7 kb. 5 kb. 3.2 kb. and 2.8 kb in these 
cells. One or more of these species could correspond to the 
sor message. 

The sor, tat, and i' or/ genes synthesize polypeptides with 
apparent mobilities of 23-24 kDa, 14-15 kDa, and 26-28 kDa, 
respectively. The 3' orf open reading in clone 3 and in clones 
1 and 12 was translated into a polypeptide of 27-28 kDa and 
25-26 kDa, respectively. This open reading frame contains 
v\o initiation codons (.ATG) 57 bp apart in phase in its 5' 
pcrtion (Fig. 1). We suggest that the first and second ATGs 
are used for translation in pSP-3 DNA and pSP-1 and pSP-12 
DN.As. respectively. Both of these ATG triplets are flanked 
by the appropriate consensus sequence requisite for efficient 
translation initiation by the eukaryotic ribosomes (18). Fur- 
thermore, the coding potential of the open reading frames for 
the sor, tat. and i' or/ genes, staning from the first in phase 
initiation codon is respectively 192, 86. and 206 amino acid 
residues, predicting the respective polypeptides of about 20 
kDa, 9 kDa. and 21 kDa. The observed mobility of the 
products of these genes in NaDodSOVPAGE was uniformly 
higher than predicted. This may suggest anomalous confor- 
.^ation and/or posttranslationai modifications of the pro- 
ceins. 

The products of the sor, tat, and 3' orf genes are immu- 
nogenic in vivo, thus identifying three new antigens for 
HTLV-III, in addition to the previously described gag and 
env gene products (19-22). The three gene products appear 
to be differentially immunoreactive and immunogenic, the 3' 
orf gene product apparently being the most potent and the sor 
gene product being the least potent in this regard. The lesser 
immunogenicity of the sor gene product may be due to its 
diminished expression in vivo and its panicuiar intracellular 
localization, or it may be related to its structure (Fig. 6). The 
predicted amino.acid sequence of the sor gene product does 
not contain a cluster of ammo acid residues that will impan 
to this protein hydrophilic structure with ^-turns-two parana- 
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Fic, 6. Hydrophobicity profile and predicted secondary struc- 
ture of sor (a) and i' orfib) polypeptides analyzed according to Kytc 
and Doolittlc i23) and Chou and Fasman (24). Secondary structure is 
depicted by boxes and venical lines represent amino acid residues. 
Open box, a-hclix: hatched box, ^sheets; closed box. ^lums. 



eters generally thought to be responsible for strong im- 
munogenicity (23. 24). Notably, the predicted amino acid 
sequence of both the sor and tat gene products lacks typical 
sequence (-aspargine-Xaa-threonine or serine-) that generally 
serves as a glycosylation site and such a sequence is present 
twice in the predicted sequence of i' orf gene product. 

With regard to any correlation between the progression of 
the disease and expression of the HTLV-III specific genes, 
the survey reponed here is too small to detect meaningful 
trends. We think it is premature to draw conclusions from the 
observation that antibodies to the 3' orf gene product were 
detected in all but one of the si.x sera from patients with AIDS 
and ARC but in only one out of four sera from HTLV-III- 
positive healthy homosexual individuals in this study. Fur- 
ther, some of the normal human sera reacted, though poody, 
with the 50r gene product. Although we cannot presently rule 
out anifactual interactions, this may suggest that a normal 
cellular gene with some homology to the sor gene exists, and 
its product is synthesized in some instances. The differential 
expression of the sor, tat. and 3' orf genes in vivo may reflect 
mutual modulatory role(s) of the products of these genes. 

We thank M. Samgadharan. M. GurofT. and their colleagues and 
collaborators for providing scrum samples used in this study. 
Additionally, thanks arc due to L. Jagodzinski and R. Liou for 
assistance with DN.A sequencing, and .M. B. Eidcn, C. Guo. and 
S. F. Josephs for useful discussions. 
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Inactivation of two genes in Yersinia 
pseudotuberculosis causes a significant in- 
crease in virulence and may explain, in pan, 
the variations in virulence of Yersinia pestis 
that accounts for the rise and fall of plague 
('black death') epidemics. See page 522 and 
News and Views. Cover shows 'St Charles 
Borromeo gives communion to plague vic- 
tims', by Sigismondo Caula (E T Archive). 
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»cool to resist 

^bulk synthesis of a Tl-Ba- 
supcrconduaor with 
qCuO, layers per unit cell is 
ribcd on page 510. With a 
sition temperature of > 120 
Jt continues the trend of 
easing with the number 
11O2 layers. The race for 
CTConductivity, Book Rev- 
»,page 479. 

letic bug 

|Scwly isolated marine mag- 
Dtactic baaeritim that un- 
rtedly synthesizes magnet- 




iin anaerobic conditions can 
btribute to natural remanent 
aetism found in long-term 
crobic sediments, page 518. 

I sun 

r eclipsing millisecond pulsar 
7+20 has three candidates for 
ompanion. Optical studies 
V the probable candidate is a 
able objea (star X) of low 
nosity, consistent with 
i in which the pulsar wind ts 
f in the form of low-<nergy 
J or X-rays. Page 504. 

tlcium control 

ionic mechanisms main- 
i elevated levels of intracel- 
' calcium in mast cells and 
f thus enhance calcium- 
endent functions such as 
etion, page 499. 



Our Solar System is not unique: 
other stars in our Galaxy seem to 
have giant planets and new 
planetary systems are forming 
elesewhere. See pages 467 and 
474. 

Target practice 

CD4-bearing T cells in vitro can 
capture, process and present 
gpl20, rendering uninfected T 
cells a target for the anti-HIV 
T-cell response, page 530. 

Take a neutrino . . . 

Results from neutrino detectors 
confirm that too few neutrinos 
reach us from the Sun, calling 
for either new physics or new 
astronomy to provide an ex- 
planation. See Review Article. 

Sink not source 

Biotite micas, previously 
thought to have been the source 
of leached copper in porphyry 
copper deposits, are now shown 
be a sink. See page 516 and 472. 

Double agent 

Perforin, the molecule used by 
cytotoxic T cells to kill their tar- 
gets, is shown to be homologous 
to a component of the serum 
complement cytolytic system, 
pages 525 and 475. 

Guide to Authors 

This issue, page 546. 
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lated, by defects in the processing machinery' or by the delivery 
of inhibitory signals^'^. In vivo, the antigen presenting function 
of T ceils will generally be insignificant because very few pro- 
teins will bind directly to T cells.. However, there are instances, 
for example the case of gpl20, in which the situation might 
change dramatically. The fact that gpl20 can bind to CD4^ cells 
and be selectively presented could therefore have immunopatho- 
logical consequences for HIV- 1 infection. Because gpl20 is 
readily shed from the surface of HIV-l infected ceils^°'^', the 
possibility exists that free gpl20 might bind to uninfected CD4^ 
T cells and macrophages and target them for destruction by 
gpl20-specific cells. We are currently testing this possibility. 

We thank Janette Millar t'or the preparation of the manuscript, 
and Drs Polly Matzinger, Ronald Palacios, Michael Bzay and 
Uwe Staerz for critical reading and comments. The Basel 
Institute t'or Immunology was founded and is supponed by F. 
Hof!mann-La Roche & Co., Basel, Switzerland. 
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Identification of a protein encoded 
by the vpu gene of HIV-l 

Eric A. Cohen*, Ernest F. Terwilliger*, 
Joseph G. Sodroski* & William A. Haseltine*t 

* Dana-Farbcr Cancer Institute, Department of Pathology, 

Harvard Medical School, and t Harvard School of Public Health, 44 
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Human immunodeficiency virus 1 (HIV-l) is the aetiological agent 
of AIDS'"^. The vims establishes lytic, ^tent^and oon-cytopathic 
productive infection in celis in culture"*^. The complexity of virus- 
host cell interaction is reflected in the complex organization of 
the viral genome^^. In addition to the genes that encode the virion- 
capsid and envelope proteins and the enzymes required for proviral 
synthesis and integration common to all retroviruses, HIV-l is 
known to encode at least four additional proteins that regulate 
virus replication, the rar, art, sor and 3' orf proteins, as well as 
a protein of unknown function from the open reading frame called 
1^10-18 q\q^ examination of the nucleic acid sequences of the 
genomes of multiple HIV isolates raised the possibility that the 
virus encodes a previously undetected additional protein. Here we 
report that HIV-l encodes a ninth protein and that antibodies to 
this protein are detected in the sera of people infected with HIV-l. 
This protein distinguishes HFV-l isolates from the other human 
and simian immunodeficiency viruses {HIV-2 and SIV)*^^* that 
do not have the capacity to encode a similar protein. 

Figure la is a schematic diagram of the open reading frames 
of the region between the first coding exons of the tat and art 
genes of HIV-l and the envelope glycoprotein gene. In this 



region many strains of the virus have the capacity to encode a 
protein of 80-82 amino acids that initiates with an AUG codon 
(Fig. lb). To examine this possibility, two oligopeptides were 
made that correspond in sequence to regions of the protein 
which were predicted, on the basis of amino acid sequence, to 
be hydrophilic. One corresponded to amino acids 29 to 41 
(peptide 1), and the other to amino acids 73 to 81 (peptide 2) 
(Fig. lb). The amino acid sequences corresponded to the protein 
that BHIO substrain of the IIIB isolate was predicted to make^. 
The peptides were conjugated to keyhole limpet haemocyanin 
and used to raise antibody in three rabbits each. After multiple 
injections of the antigen, the rabbits were shown to produce 
antibodies that recognized the oligopeptide (data not 
shown). 

The ability of the region between the first coding exon of tat 
and the env gene to encode a protein was first examined by an 
in vitro translation assay in a reticulocyte lysate"*, using RNA 
made in vitro''^, RNA was made from a restriaion fragment, 
2,231 nucleotides long, of an HIV provii^s that spanned the 
region between the first coding exons of the tat, art and pan of 
the env genes. The template was derived from a fragment of the 
provirus of the ELI strain of HIV-l placed 3' to the SP6 bac- 
teriophage RNA polymerase promoter'* (Fig. la). This strain 
was selected as it contains an open reading frame in this region 
that initiates with an AUG codon ( Fig. 1 £i )*■*. The viral sequences 
present in this RNA transcript, as shown in Fig. la, extend from 
the 5' end of the first coding exon of the tat (flam HI site) lo 
1,839 nucleotides {Bglll site) within the env sequence. However, 
the initiation codon for the tat gene is not intact in this RNA 
as flam HI cleaves the ELI proviral strain between the T and G 
residues of the tat initiation codon. 

Proteins produced in the in vitro lysate using the RNA derived 
from this proviral fragment were labelled with ^^S-methioninc 
and separated by size using sodium dodecyl sulphate-poiyacryl- 
amide gel electrophoresis (SDS-PAGE) (Fig. 2a). The proteins 
synthesized in this sytem are displayed in lane 2. The proteins 
precipitated by rabbit anti-peptide-2 serum are also shown. Two 
proteins of relative molecular mass of approximately 15,000 
(15K)and 16,000 (16K) are evident in the unfractionated extract 
and are precipitated by the rabbit antisera. The 15K and i6K 
proteins are not precipitated by the pre-immune rabbit sera ( \ ne 
3). AJI three of the antisera to peptide 2 recognize both prot : .ns 
(lane 4) as do the antisera to peptide L albeit more weakly (data 
not shown). The data of Fig. 2a also show that peptide 2 
competes for recognition of the ISKand 16K proteins by antisera 
(lane 5). However, peptide 1 (lane 6) or an unrelated peptide 
do not compete with anti-peptide-2 serum (lane 7). 

To confirm the origin of the proteins, RNA from other proviral 
fragments was used in the in vitro translation assay. In one set 
of experiments, the template used for synthesis of RNA was 
truncated by restriction enzyme cleavage either seven nucleo- 
tides 5' to the proposed AUG codon ( Rsa I site) or 30 nucleotides 
3' to the proposed AUG codon (Bbvl site) (Fig. la). No specific 
protein products recognized by anti-peptide-2 antiserum were 
observed in these experiments (Fig. 2b, lanes 1 and 2). When 
the template used for synthesis of RNA was cleaved 102 nucleo- 
tides 3' to the proposed stop codon^(Ndel site), the 15K and 
16K proteins were detected using anti-peptide-2 serum (Fig. 2b, 
lanes 3 and 4). 

To examine the possibility that the proteins corresponding to 
the 15IC and 16K products are produced in natural infections, 
the ability of antisera from normal and AIDS patients to recog- 
nize the protein synthesized in vitro was tested. The data of Fig- 
2c demonstrate that HIV seropositive patient antisera recognize 
both the 15K and 16K proteins (lanes 2, 4 and 5). The ability 
of antiserum to precipitate the two proteins is partially competed : 
out by peptide 2 (lane 3). The 15K and 16iC proteins are no^ 
recognized by normal human serum (lane 1). However, all ol 
the 19 sera of HIV-l infeaed patients that immunoprecipitaicd 
the truncated env product were found to precipitate both ih*^ ' 
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Fig. 1 a, Genetic organization of the centra! region of HIV- 1 ( ELI 
isolate, ref. 24) compared with SIV-^-^^ and HIV-2 (ROD isolate, 
ref. 19). Arrows indicate the initiator AUG codons in viral genes. 
SP6 plasmid used to synthesize messenger RNA, a Bam H I to Bg/I I 
fragment, 2,231 nucleotides long, from the HIV ELI provirus that 
spanned the region between the first coding exons of the tat, art 
and pan of the env gene was cloned 3' to the SP6 bacteriophage 
RNA polymerase promoter^. Internal restriaion sites used to 
linearize the plasmids are indicated. 6, Aligment of the vpu gene 
protein sequence. The ELI isolate is taken as reference. Gaps ( — ) 
were introduced to optimize the alignment. Asterisks indicate 
amino acid identity. The HIV isolates compared include ELI, MAL 
(ref. 24), HXBc2, BH-IO, BH-8, pHXB3 (ref. 6), BRU (ref. 7) and 
USF2 (ref. 8). USF2 contains a termination codon at position 39 
([I]). However, a -1 frameshift results in an extension of 43 amino 
acids that are well conserved when compared with the ELI U 
sequence. 



Fig. 2 In vitro characterization of the vpu gene product, a, pEU 
plasmid was linearized by digestion at an £co RI site located in 
the polylinkcr 3' to the HIVgji insen and used as template for in 
vitro transcription by SP6 RNA polymerase as described^* except 
that the concentration of GTP and cap analogue m^GpppG were 
raised to 0.2 and 1.0 mM respectively. Messenger RNAs were 
labelled with [5'-^H) CTP and purified as described-^ In vitro 
translation of cquimolar amounts of RNA (equal amounts of 
radioaaivity) was performed in reticulocyte lysate". Incubation 
was at 30 "^C for 30 min in the presence of ^^S-methionine. Labelled 
products were analysed direaiy by 15% SDS-PAGE (lane 2) or 
immunoprecipitated^^ beforehand with pre-immune rabbit scrum 
(lane 3); anti-peptide-2 scrum (lane 4); anti-pcptide-2 serum in 
the presence of 500 ^jlM of peptide 2 (lane 5). peptide 1 (lane 6). 
or an unrelated peptide QEEAETATKTSSC (lane 7), Lane 1 
represent a total translation reaction with no mRNA added. 6, 
pEU plasmid was linearized with the following restriction enzymes 
Rsal (lane 1); Bbvl (lane 2) and Ndel (lanes 3 and 4). SP6- 
generatcd RNAs were translated in vitro and tmmunoprecipiiation 
was performed on the labelled products using anti-peptide-2 serum 
(lanes 1, 2 and 4). Lane 3 represents a total in vitro translation 
reaaion. c. After in vitro translation of SP6-generated pEU RNA, 
the labelled products were immunoprecipitated as described"' 
except that 1 M NaCl was used in the immunoprecipitation reac- 
tion. Immunoprecipitation with a pool of normal human serum 
(lane 1); HI V-1 -infected human sera (lanes 2, 4 and 5); HIV-1- 
infeaed patient serum in the presence of 500 m.M of peptide 2 
(lane 3); HIV-2-infccted human serum (lane 6) or SIV-infected 
Rhesus macaques serum (lane 7). Fifteen HIV-2-infected human 
serum and four SIV-infected Rhesus macaques serum were tested. 
These sera were demonstrated to specifically react with HIV-2 or 
SIV proteins by immunoprecipitation and Western blot analysis 
(not shown). None of these antiscra immunoprecipitated plS****" 
and pie"**". Immunoprecipitates were resolved on 15% SDS- 
PAGE, 



15K and 16K proteins (data not shown). Antisera from HIV-2- 
infected humans or from SIV-infected macaques do not precipi- 
tate either protein (Fig. 2c, lanes 6 and 7). 

We examined whether the anti-peptide-2 serum recognized 
the 15K and 16K proteins in three cell lines that constitutively 
express HI V-1 proteins art and env encoded by the 3' half of 
the virus. Cloned HeLa cell lines that have stably integrated the 
region between the art gene and the 3' long terminal repeat 
(LTR) of the proviral ELI, HXBc2 and MAL strains'^'* of HIV 
were isolated (Terwilliger et al, in preparation). The plasmids 
used for construction of these cell lines contained the HIV LTR 
juxtaposed 5' to the initiation codon of the art gene. The tat 
gene product was supplied in trans. Figure 36 shows that the 
anti-peptide-2 antiserum specifically recognized a 15K protein 
in the cell line derived from the ELI provirus (lane 3) that 
comigrates with the 15K protein made in vitro (lane 2). The 
same antiserum does not recognize a protein in the cell line that 
expresses proteins derived from the MAL (Fig. 3c) or the HXBc2 
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Fig. 3 Identification of the vpu gene product 
in cell lines constitutiveiy expressing proteins 
encoded by the 3' half of the HIV-I ELI pro- 
virus. HeLa cell lines that have the region of 
the HlV-1 provirus between the art gene and 
the 3' LTR of the HIV proviral ELI, HXBc2 
and MAL strains stably integrated were con- 
scructed (Terwilliger et ai, in preparation). The 
parental cells used to isolate these cell lines had 
previously been selected for constitutive 
expression of the tat gene product, following 
infection with a retroviral vector carrying the 
lat coding sequences"^ Ceils were labelled with 
-'S*methionine and cysteine and cell lysates 
were immunoprccipitatcd as described*^, a, 
HeLa tat cell line lysates immunoprecipitated 
with anti-peptide-2 scrum (lane 1) or HIV-1- 
infccted patient serum (lane 2). b, HeLa tat ELI 
lysates immunoprecipitated with pre-immune 

rabbit serum (lane I); antipeptide-2 serum (lane 3); anti-peptide-2 serum in the presence of 500 p. M of peptide 2 (lane 4); normal human 
serum (lane 5); HIV-l-infected patient serum (iane 6). Lane 2 represent an immunoprccipitation of labelled in vitro translated produa from 
pEU RNA with anti-pcplide-2 serum, c, HeLa tat MAL lysate immunoprecipitated with pre-immune rabbit serum (lane 1); anti-pcptidc-2 
serum (lane 2): normal human serum (lane 3) and HIV-l-infected patient serum (lane 4), HeLa tat IIIB lysate immunoprecipitated with 
pre-immune rabbit scrum (lane 1); anti-peptide-2 scrum (lane 2) normal human serum (lane 3) and HIV-l-infected patient serum (lane 4). 




(Fig, 2x1) proviruses. This result was expected as neither of the 
proviruses contain a properly positioned initiation codon at the 
5' end of the open reading frame (Fig. 1^). The absence of 
deteaion of the 151C protein by the HIV- 1 patient antiserum in 
the ceil line derived from the ELI provirus is probably due to 
both the low antibody titre in the antiserum used and the much 
smaller amount of the 15K protein in the cell line compared to 
the in vitro translation products. 

The experiments presented here demonstrate that HIV-1 has 
the capacity to encode a previously unrecognized protein. The 
open reading frame from which this protein is synthesized was 
originally designated U (ref. 7) and so we propose to call the 
gene i?pu, for viral protein U, and the proteins produced pi 5"^" 
and pl6'''^". The product of vpu is made upon HIV-1 infection 
as antisera from the majority of HlV-1-infected people surveyed 
have antibodies that recognize the protein. 

All HIV-1 provira! strains isolated contain an open reading 
frame in the region corresponding to vpu. However, the ability 
of the individual proviral strains to produce a protein from this 
region is compromised in some strains by a single point mutation 
that prevents vpu expression. Indeed, different proviral strains 
from the same viral isolate differ in their ability to encode vpu: 
independent proviral clones of the IIIB isolate, HXBc2, BHIO, 
BH-8 and BH-3 are an example (Fig. 16), There is a similar 
variation in the ability of individual proviral clones to encode 
other viral proteins, for example, the 3' orf product. The muta- 
tion that truncates the protein product of the IIIB 3' orf yields 
a virus that replicates more rapidly in culture than the wild-type 
virus'^. The virus produced by transfection with HXBc2 can 
grow in T cells in culture*^ implying that a virus which cannot 
express vpu can replicate. However, the ability of the vpu~ virus 
to replicate does not rule out the possibility that the vpu product 
is important in regulation of viral replication or pathogenesis. 

The vpu gene distinguishes HIV- 1 from HIV-2 and SIV infec- 
tions. A computer-assisted search for proteins similar to 
plS/ie''^" showed that HIV-2 and SIV do not encode a similar 
protein. HIV-2 and SIV strains do contain an open reading 
frame that is missing from that of HIV-1 isolates, the X open 
reading frame^', but there is no predictable similarity in the 
predicted protein products of vpu and the X open reading frame. 
None of the sera of HIV-2-infected patients surveyed contained 
antibodies to the vpu product, nor were antibodies to vpu 
detected in Rhesus macaques infected with SIV. 



We note that vpu is highly conserved amongst the HlV-i 



proviral sequences isolated (Fig. 16), and that vpu is removedi tinn% 



Trai 
the « 



IL-P- 
I 

guropt 
i900 H 

(onitu. 

Befen 

dijtoag 
f«qi 
tlie 

#be 



by splicing from viral messenger RNAs that encode regulatory 
proteins'"' It is therefore predicted that the vpu produa is 
not made in the absence of the art gene produa as only fully 
spliced messenger RNAs accumulate in the absence of ihisi 
p^odua'^■'^ We suspea that the vpu produa is made late in 
infection like virion proteins 
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Summary 

The complete 9193-nucleotide sequence of the prob- 
able causative agent of AIDS, lymphadenopathy-asso- 
ciated virus (LAV), has been determined. The deduced 
genetic structure ts unique: it shows, in addition to the 
retroviral gag, pol, and env genes, two novel open 
reading frames we call Q and F. Remarkably, Q is lo- 
cated between pol and env and F is half-encoded by 
the U3 element of the LTR. These data place LAV apart 
from the previously characterized family of human 
T cell feukemia/lymphoma viruses. 

Introduction 

The recent onset of severe opportunistic infections among 
previously healthy male homosexuals has led to the char- 
acterization of the acquired immune deficiency syndrome 
(AIDS) (Gottlieb et ai., 1981; Masur et aL. 1981). The dis- 
ease has spread dramatically, and new high-risk groups 
have been identified: patients receiving blood products, 
intravenous drug addicts, and individuals originating from 
Haiti and Central Africa (Piot et aJ.. 1984). AIDS is a fatal 
disease, and there is at present no specific treatment. The 
causative agent was suspected to be of viral origin since 
the epidemiological pattern of AIDS was consistent with 
a transmissible disease, and cases had been reported af- 
ter treatmerrt involving ultrafiltered anti-hemophilia prepa- 
rations (Daiy and Scott. 1983). A decisive step in AIDS re- 
search was the discovery of a novel human retrovirus 
called lymphadenopathy-associated virus (LAV) (Sarr^- 
Sinoussi et ai., 1983). The properties of the virus consis- 
tent with its etiological role in AIDS are: the recovery of 
many independent isolates from patients with AIDS or 
related diseases {Montagnier et al., 1984); high LAV 
seropositivity among these populations (Brun-V^zinet et 
al., 1984); a tropism and cytopathic effect in vitro for the 
helper/tnducer T-lymphocyte subset T4 (Klatzmann et a]., 
1984), also found depleted in vivo. 

Other groups have reported the isolation of human 
retroviruses, the human T ceil leukemiayiymphoma/lym- 
photropic virus type 111 (HTLV-IU) (Popovic et al., 1984) and 
the AIDS-associated retrovirus (ARV), which display bio- 
logical and sero-epidemioiogicaJ properties very similar to 
if not identical with those of LAV (Levy et ai., 1984; Popovic 
et ai., 1964; Schupbach et ai.. 1984). Both LAV and HTLV- 



III genomes have been molecularly cloned (Alizon et al.. 
1984; Hahn et al.. 1984). Their restriction maps show 
remarkable agreement, including a Hind III restriction site 
polymorphism, bearing in mind the variability of this virus 
(Shaw et aL. 1984) and confirming that these two viruses 
represent a single viral lineage. 

In addition to its obvious diagnostic and therapeutic 
potential, the LAV ONA nucleotide sequence is essential 
to an understanding of the genetics and molecular biology 
of the virus and its classification among retroviruses. We 
report here the complete 91 93- nucleotide sequence of the 
LAV genome established from cloned proviral DNA. 

Results 

ONA Sequence and Organization of the LAV Genome 

We have reported previously the molecular clorting of both 
cDNA and integrated proviral forms of LAV (Alizon et aL. 
1984). The recombinant phage clones were isolated from 
a genomic library of LAV-infected human T-lymphocyte 
DNA partially digested by Hind ML The insert of recom- 
binant phage iJ19 was generated by Hind 111 cleavage 
within the R element of the long terminal repeat (LTR). 
Thus each extremity of the insert contains one part of the 
LTR. We have eliminated the possibility of clustered Hind 
III sites within R by sequencing part of an LAV cDNA 
clone, pLAV 75 (Alizon et al., 1984), corresponding to this 
region (data not shown). Thus the total sequence informa- 
tion of the LAV genome can be derived from the A.J19 
clone. 

Using the M13 shotgun cloning and dideoxy chain ter- 
mination method (Sanger et aL. 1977), we have deter- 
mined the nucleotide sequence of AJ19 insert. The recon- 
structed virai genome with two copies of the R sequence 
is 9193 nucleotides long. The numbering system starts at 
the cap site (see below) of virion RNA (Figure 1). 

The viral (+) strand corrtains the statutory retroviral 
genes encoding the core structural proteins (gag), reverse 
transcriptase (pol), and envelope protein (env), and two 
extra open reading frames (orf ) that we call Q and F (Table 
1). The genetic organization of LAV. 5UR-gag-pol-Q-env- 
F-SUR, is unique. Whereas in all replication-competent 
retroviruses pol and env genes overlap, in LAV they are 
separated by orf Q (192 amino acids) followed by four 
small (<100 triplets) orf. The orf F (206 amino acids) 
slightly overlaps the 3' end of env and is remarkable in that 
it is half-encoded by the U3 region of the LTR. 

Such a structure clearly places LAV apart from previ- 
ously sequenced retroviruses (Figure 2). The (-) strand is 
apparently noncoding. The additional Hind 111 site of the 
LAV clone UB^ (with respect to XJ19) maps to the appar- 
ently noncoding region between Q and env (positions 
5166-5745). Starting at position 5501 is a sequence 
(AAGCCT) that differs by a single base (underlined) from 
the Hind til recognition sequence. It is anticipated that 
many of the restriction site polymorphisms between differ- 
ent isolates will map to this region. 
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CArAAACAAAAAACACACTACTAAATCCACAAAATTACTACATTTCACACAACTTAATAACACAACTCAACAC^ 

2300 . . . r . , , _ ^ .2400 

LyiLytS«TValThrV*lUuA»pV4lClyA»pAUTy rfh«S «TV«lProUuAipCluA«pPh»ArtLy<TyrThrAUPh«ThTll«ProS«rIUA<nA«aCUThrProClyU« 
:.AAAAAATCACTAACACTACtCCATCTCCCTCATCCATATTTTTCACTTCCCTTACATCAACACT^ 

2500 

ArjTyrCIaTyrA«oV«lU«/toCloClyrrpLytClyS«rProAUXUPhtCUS«rStrH«trhrLy«Il«L«uCUProPh«ArtLy.ClaAiofToAtpn«V«UUTyrCU 

:acatatcactacaatctccttccacacccatccaaaccatcacca(Xaatattccaaactaccatcacaaa^ 

2600 .... 

TyrK«tAjpAipL€uTyTV«lCly$«rA*pUuCluIlfClyClnMi«Ar|ThrLyiIl«CUCluUuArtClnHi«UuUuAr|TrpClyUuThrThrProA«pLy«ly«ai«CU 
AlACATCGATCAmCTATCTAMATCTCACTTACAAATACCCCACCATACAACAAAAATACACCA 

2700 ...... 

Ly.CUProProPh«l.«uTrpK«tClyTyrCluL«uHitProA«pLy«TrpThrV«lCloProlUV«lUMProCluty«A«pS«rTrpThTV«U«oA*pIl«Cloty«L«uV«lciy 
CAAAIlfcACCTCCAnCCTmCATCCmATCAACTCaTCCTtUTAAATCCACACTACACCCTATACTCCTCCCAC^^ 

2M0 ........ 

Ly»L«ttA«oTrpAl*S«rCUIUTyrProClyll«ty«V«UrtCUUwCy«Ly«UuUuAr»ClyThrLy»Al4UuThTCUV«iIUProL«uThrCUCluAi«CUUuciu 
AAAATTCAArrCCGCAACTCACAmACC CACCCATTAAACTAACCCAATTATCT AAACTC CTTACACCAACCAAACCACTAACACAACTAATACCACTAAaCAACAACCACACCTACA 
2W0 ......... 3000 

L«aAUCluA«tUrtCluIUUuLy«CluPToV4lHi«ClyV«iTyTTyrA«pfroS«rLy«A«pUuU«Al»CUXl«ClaLy«CUClyClaClyClaTrpThrTyrCloIl«Tyr 
ACTCCCACAAAACACA{UCAmTAAAACAACUCTAUTCCACTCTATTATUCCaTCAAAACACTTAATACCAC^ 

. 3100 

CUCUPToPh«lytA«ta«uLy«ThrClyLy»TyTAUArtThrAxtCiyAUHi«ThrA«aA«pV«Uy«ClnUuThrCluAUV4lClttLy«n«ThrThTCUS«rIUV«in 
TCAACACCUmAAAAATCTCAAAACACCAAAATATCCAACAACCACCCCICCCCACACTAATCATCTAAAACAATTAA 

3200 .... 

Tr?ClyLy»ThrProLyfPh«LytU«ProIUCULy«CUThTTrpCluThrTrpTrpThrCluTyrTrpClaAUThrTrpIl«ProCluTrpCluPh«V«U.oTlirProProUu 
ATCCCCAAACACTCCTAAATTTAAACTACCaTACAAAACCAAAaTCCCAAACATCCTCCUCACACTATOCCAA^^ 

3300 ...... 

VilLy»UuTrpTyrCloLeuCluLy.CUPToU«V*lClyAUClunt?htTyrV«UipClyAUAUS€rArjCiuThrLyiUuClyLy.AUClyTyrV.^ 
ACTCAAAnATCCTACCACTTACACAAACAACCCATACTACCAGCACAAACCTTCTATCTACATCCCCXACaACCACCX^ 

3400 

AACACAAAAACTTCTCACCCTAACTCACACAACAAATtUCAACACTCACrrACAAC^ 

......... 3600 



I 
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:^uCl7U«IUCl=AiJCLaPraX»?L^tSfrCU5erCULeuV4U»aClalun«CluClaIeulULy*I.y«CUI,ytV*lTyrUuAUrrpV4lProAUHi«Ly*Ciyr:e 
irUCCAATUTTCAACCfcCA^CaCAT^AJUlCTCAATCACACrrACTCJUTCAjUTMTAaCCACn 

3700 

CIyClyA4aCluClaV4lA«pLy«teuV4UtrAUCIyIUAj-(Ly«V4lL«uPbeL«uA<pClyIUA«pLy«AUClaA«pCLuUi«ClaLyiTyrUtiS«rAj3TrpAr«>UHf t 
:cr^V.iAATCAACAACTACA7AAAnACTaCTCC?CGAATCACUiUCTACTATrmACATCC^ 

3800 .... 

A:jj«fAJ ??a< Ai = l»u?raProV«IVjlAUI.ytCUU«V«lAUS€rCy«A4pLy«Cy«CUL«uLy«ClyCluAUH»tBi«ClyCIoV«lAjpCytS«rProClyU«Tr?Cl3 
^CCtACTCArrTTAACCTCCCUCCTCTACTACCAAAJUUAATACTACCCACCTCTCATAAATC 

3900 ...... 

I.<y-U?Cy»rbca!.iL«uCUClyLyiV4l lUL^uV*! Ai 4 Vil Hn V4 1 Al *S«rCl y Ty r lUC I oAUC I uV* 1 1 1 tProAl «CluThrClyC; oCluThr AU Tyr Ph«UuL«u 
ACTICA rrCTXU Ca 7TTACA>C GajUAC TTA TCCTCC: ACCaCTTC ATCT ACC aCTCa TAIaTACjWCCACAAC^^ 

4003 ........ 

Ly«L*ttAUClyAft:r??roV4lLy«7brU«Hi.ThrA«9A»taClyS«rA»a?heThT£«tT>»TThTViUy#AUAUCy«TfpTrpAUClyU€Ly4ClQCU?ti«Cly IlfPrs 
AAAaTTa^CACCaaCa Tea CaCTAAAAACaaT ACaTACaCaCAATCCCACCaaTTTC AC aCTACTACC CnAACCC ccc ^ 

4100 4200 
TyrA40PT0CiQS«fClaCiyV4lV4lClui*rM*CA4aLy*CluL«ul.y4Ly4Zlf IUClyCUV*UrgA4pClaAUCluiii4UuLy4ThTAL«V«lClaM«tAUV«VPi»Ue 
CTACA-iTCCCCAXfcCTCfcACCACTACTACAATCTATCUATAAACAATTAAiLUAAAnATACKaCaAACACAK^ 

4300 

tt^4A4 aP&t Ly»Artty4ClyG;ylUClyClyryrS€rAl4ClyCluAf|a<V4U4pntIl«AUThrA4pU»CloT^rLy<CUL«uClaJLy4ClcUt'n>fly«IltCliA4o 
CaCJarrTtAAAACAAAACC<XOCA77CCCC<;CTACACTCCACW^^ 

4400 .... 
?beAjjV4lTyrTyrAriA4pS€rArtA<?PfoL«uTr?Ly4C;iy?TOAl4Ly4S.«uL«uTr?Ly4ClyCluClyAl«V4LV4nUClaA4pA.oS«rA*pIl«l.ytV4lV4l?foAri 

. ^ a— Cy»CUC\u 

7TTrc:CCTTTAi4ACACCaACACCACACATCCACTTTCCA>ACUCCACCAAACC-CCTCTCCAAACC7rCA^^ 

4S00 ...... 

AriLy»AUI.ytUetUArtA4pTyrClyLy4Cla*l<:Al4ClyA*pA*pCyiV4lAUS«rArtClaAipCUA*p • 
CliJ.y*ClaArtS«fL<uClyn4ie^laA«aArgrr?Cl3V4lH«tn«V*lTrpClaV»U«?AfiM«iArtU«AfgThrTrpty4S«rL««V4ay«Hi4aiiH4tTyTV4lS«r 
AACAAAAiCAAACATCA rr ACCCaT: ATC CAXfcACA CA TCCCaCCTCATCA nCTCTCCXAACTACACAC CUT^^ 

4600 ........ 

C;yLy«AlAAr|ClyTrpPii«TyfAriai<ai*TyrC jS«f ProHnProAf tIl«S«rS€rCluV*iai4lUProUuClyAjpAl*ArtUuV4l lUThrt^tTyrTx^ClyUu 
CACCCAi^CTACCCCAlCCTrrTATACAaiCACTATuAAACCCaCATCCAACAATAACTTCACjU^ 

'^00 ......... 4aoo 

ai*TifCljCluAru«pTTpai*l..uC\yClaClyV4lS€rUtCluTfpAf4Ly»LyiAriTyrS«rThrCUV*U4pProCluUuAI*A*pClol«uIleai*i«uTyTTyrPls« 
TtXATACACCACAAACACACTCCCATCTCCCTCACCGACTCTCCJlTACAATCCJkC(UAAAA^CATATACCAUC^ 

4900 

A4pC r»Ph<$« rA«pS«fAl4lUArtLy4AUUuUuClyai4lUV>lS<TfroAxtCy4CluTyrClaAl4ClyHi4A4Ql.yag<lClY5«rUuCIoTyrL«uAl4UuAl*Aii 
TTCACTCrrTTTCkUCTCTCCTATAAIlUACaCTTATTAiUCATATACrrACCCCTACCTCTCAATATCA^^ 

5000 .... 

L«uIUTbrProLy4Ly4lULyiProProUuPtoS»rV4lThrl.y«UunirCluA«pAr|TrpA-oLy4ProCUi.y4nirLy4ClyHi*Ar»ClyS,T«i*rbrK«cAjaClyaii 
CArrAATAACACCAAAAA/CATAAAiCaCCTTICXCTACTCTtACCAXACTCAaUCCATAUTCCAACAACCCCaiU^ 

• • .... 5100 ..... 

ACTAOiCrrrTACACCACCrrAACAATCUACCTCnACAaTmCCTACCATTrCCCTCUT 

3200 ........ 

UTAATAACAATTCTCCAACAACTCCTCrrTATCCArrTCACA>mCCTCTCCACATACCACAATAC(XCnACTCAACACA 

5300 . , . _^ . ^ ^ ^ ^ .5400 

ACCCCTCCAACCATCUCCAACTCACCCTAAAACTCCTTCTACaCTTCCTATTCTAAk^TCTTCCTTO 

oft F - 3 «^**^^CCCAi:ACACCCACCAACACCTCaCJUCCOCTCA<UCTCATCAAC^ 

TACTACCAATAATAATACC^TACTTCTCTCCTCaTACTAATUTACAAST^ 

• • . . ,<fJ 5700 ...... 

V«Ul^e^c74lLy4Cluly«TyrClaEiiUuTrpArfTrpClyTrpLy.IrpClyTbrK*tUuUuClyU«UuiUcIUC¥4S«TAl«TbrClul 
CTCaAATCACACTCAACCACAAATATtU(XACTTCTCGACATCCaWtC^ 

5800 

ryTTyrCl79«iProV*UrpLy.CluAUTbTThTThrUun«Cy.Al.S«rAj 
lAmTCCCCTACCTCTCTCCAACUACCAACaCCACTCTATTTOTCCATCAtUTCCTAAACCATATt^ 

6000 

?roA*aProCUCUV.lV4iU«V*U«V,mrCluA*Qn«A*a««tTrpLy»A»aA«pH«iV«lCUCliUWtHi*CluAi|>IUIUS«rUuTrpA4pCU4«rUuLT4PTO 
CCCAACCUCAACAACTACTAnCCTAAATCTCACACAAAATTTTAACAlCTCQUAAAT(U 

' . ! . . , 6100 

Cy.V4Uy«U«ThrPToU«Cy^V4lS«rUgiyKy*TbrAjpUuClyA*(aUThrA*oTbrAjoS«rS^ 
TCTCrAAAATrAACCCaCTCTCTCTTACmAAACTCCAaUTrTCCCaATCCTACTAATACC^ 

1 ..... 6200 . . 

U«ty*AAmC,»$«rt^oIWS«rTbr$«rll«ArtCIyLy4V«lClaLytCluTyrAi*f 
ATAAAAAACTTXTCTTTCAATATCAtXACAACCATAACACCTAACCTCaCAA^ 

: • . . , 6300 .... 

^'**l£![***"^'**''*^"*^'*^l'*^^*Cy»''oLy«V»ll«rrh,CU?roU«froU«Hi«TyrCy4Al*ProAl*Clyfh*^^ 

ACAACTTCTAAXUCCTCACTCATTACACACCCCrCTCCAAACCTATCCrnCACCCAATOCCATA 

. . . 6400 .... 

A.oClyTbrClyProCy.ThrA.QV«iS«rTbr?«iCloCyiIbraisCIyn.Ar|froV.iV4lS«cTbrClaUuL*uUuA^^ 
AArCGAAUC^CCATCTACAAATCTO.CXACACTA£AATCTACACATCCAATTACCCCACtACTATCA^ 

, , *500 . . . . . . . . «600 

TCTCCCAAmCACACACAATCCTAAAACUTAATACTACACaCAACCAATCTCTACAAATTAATO^ 



6700 



«CACA«ArncnACAATACCAAAAATACCAAATATCA(UCAJUXAUTO 

. *•..., 6400 

CCAAATAATAAAACAATAATCTTTAACCAAKCTaCCAC(XaCCCACAAATTCT^^ 

— • . • . 6900 

ACTTCCmAATACTACTTCCACTACrCAACCCTCAMTAAUCICAACCAJUJTCAUCAATU 

7000 j_ 

7300 
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Clr lUClyAUUuPh«U«ClyPh«UuCl7Al*AUClyS«rThrM«tClyAUAftS«tK«tTbrL«uThrV«LCltUUAr|CUL«uL«u3«rClyIUV4lCUCUCltiA#D 

ccAATACcACcrrrcmcTTCccrrcTTcccACUccAccjucucTATcccccc^^ 

7400 .... 
AtoLcuI.«uAr(Al«U«CluAl«ClaClaHt«;.«ul.<uCIaI«uThrV«l TrpCI yl 1 «trtClal.«uCl&AUAr« H «t«uAl« V«l CluAriTyr L«uLy4AapCloCl aL«uL«u 
AATrrcCrCACC(XTArrCACCCCCAACACCATCTCTn:CAACTCACACTCTCCCCC^ 

. 7 300 . . . . 

CI ]rU<TrpCl7Cy«StrClyLy«L4uIUCy«TbrThrAl«VaLPT0TTfiAiaAl45<rTrpS«rAjoLyiS«rL«uCUCUIlcIrpAtQA«aiWcThrTr?H4tCUTrpA«pArs 
G lXAmCGCCrrCCTCTC(UJUUkCTCATTTCCAC CACTOTCTCC CTTCCaATCCTAC^ 

. , . 7600 ........ 

CluUcAtiiA«aT7TThrS«rL«uIl«tlitS«rL«uUcCluClu5«rClaA«oClaCloClML7tA«aCluClaClut<uLcuCluL«uA«pL7tTrpAI«S«rL«uTrpAioTrpPh« 
CAAArrAACAATTACACAA(XTTAATACATCCTTAATTGAACA>TCCCAAAACaCCAAC^^ 

. 7 700 . . . . . . . . .1300 

A«oUtTbrAtaTrpL«uTrpTyTU«Ly«U«Ph«U«H<cUcV4lClrClyUuV«lClyUuAr(n«V«lPb«A:«V«lUuS«rlliV«lAaaArtV«lA;(CLaClyTyrSer 
AACATAACAAATTCCCTCTCCTATATAAAAATATOATAATCATACTACCACCCrrCCTACCmAA^ 

7900 ^ . 

ProL«uS«TPh«ClaTbrUiaL«uPTorhTProAr(Cl7ProA«pAxtPToCluClyIl«CluCluCluCl7Cl7CluAx^«pArcAJ9Ax|8«TllcArtUuValAAaCl7S«rL^ 

A . . ... . . . sooo .... 

Aa4i«uII«TrpA«pAtpL«uArtS«TL«uC7tL«uPh«5«rT7rUi<Ar(L4uArfAspL«uL«uJUuU«VclTbrAr(Il«f«lCluX.4uX.«u^ 

CpCTrATCTCCGACCATCTt y crjtf C CTCTCCCTCTTCACCTACCACCCCTT^ 

. . . . . 8100 . ..... 

^ l7<T7rTrpTrpA4oUuLcuCUTyrTrpScrCUCluLcuLy4AfaStrAi«V«lS«rUuL«uAJaAl«T^AUIlcAUV«lAUCUCl7'rhrA«pAxfV«llL«CluV«lV4l 
„ C AAATATTCCTCCAATCTCCTACACTAmCACTCACCAACTAAACAATACTCCTCTTACCTO 

8:oo . . . ^ . 

ClQClyAUCy4ArtAl«Il«ArtUi«IL€ProAfKArsn«ArtCUClyUuCluArcIl«UuL«u • 

OfiP F ^ AipAreAUTrpLytClyPkuCytTTTLy^t^lyClyLyiTTpSarLytSttS^rV^LTalClyTrpPToTbrVal 
CAACUCCTTCTACACCTATTCCCUCATACCTACAAgAATAACAUCCCCrrC Q LAikCCArm TrngrrCTA ail h ^ : m- aaaa a/t ta ii n .i . ■ - ti /HTgrrm rT-rr 

ajoo , . . . . . . . . 8400 

Ar(CluAr(>UcArtArUi«CUProAlAAl4A^pClyV«lCl7AlAAa4S«rArftA«pUuCUL7tai«Cl7U«IUTbrS«rS«rAjaTbrAlAAl4TbrA«i^ 
AACrrCAAACAATCUCACCACCTGACCCACCACCACATCayrrCCCACakC^ H ^ C 

8500 

7rpL«uCluAUCUCUCluCUCluValClyPh«Pro7«lThTPToCXaV«LrroUuArtPtolWcT^T7rL7«Al4Ai«ValA«pUuS«Tli«rb«LMJL7<CluL7iCl^ 
CTCCCTACAiUyiA r Air^CCACCACGACCTC Q: ! I j iL CACTCACACqCACCTACCTTTAACACCAATCACTXA r i i CCW k C CTClACATCnAGCCA U 1 1 L 1 AAAAnAAAACpW^. 

8600 .... 

UuCUCl7Uwll«Hi«S«rClaAxtArtClaAapU«I^uA*pL«uTfpU«T7Tai«T^TClaCl7T7TPb«froAa9TrpClaAjal7rTbr^roCl7ProCl7V4l^ 
ACTCCAACCCCtAATTCACTCCrAACCAAf.ArA^IATCCncUTCTCTCCATCTACCAaCArAAC^ 

8700 ...... 

UuThrPh«Cl7TrpC7«t7Ti7*L«uV«irTo7alCIurraAjpL7tV«lCluCluAlAA«al7tCl7CluA«aTbrS<rL«uL«ttXi«rrof«18«rl^^ 

actcaccttocatcctcctacaacctactaccacttcagccacataacctacaacacccc^ 

8800 ........ 

CluArtCluV«IUuCUTrpArtPh«Ajp$«rArtUuAI«Ph<Ui4UitV«IAlAArgCluI^uftiiPToCluT7Trh«L7«AaaC7« • 

TCA^CACAACTCTTACACTCCACCrrTCAUCCaXCTAXXArrTCATCACCTCa^ 

ft900 - . . . . . . . . 9000 

CCCTCCCCACTmaC<XACCCCTCCCCTCCCCCCaCTCCCCACTCCCC^ 1 UlU CCTCTACTCCClLltlCIU*! lACACCACATTr 

w.fvjui , - . . 9100 

CAGCCTCCCACCTCrrCTCCCTAACTA^CCAACCCACTCCrTAACCCrCAATAAACCTTCCC^^ I C A 

9193 



Figure 1. Complete ON A Sequence of Viral Genome (LAV-la) 

The sequence was reconstructed from the sequence of phage AJ19 insert. The numbering stans at the cap site, which was located experimentally 
(see atxjve). Important genetic elements, major open reading frames, and iheir predicted products are indicated together with the Hind tit cloning 
sites. The potential gtycosylation sites in the env gene are overlined. The NH,- terminal sequence of p2S^ determined by protein microsequencmg 
IS boxed (Genetic Systems, personal communication). 

Each nucleotide was sequenced on average 5.3 times: 85% of the sequence was determined on both strands and the remainder was sequenced 
at least twice from independent cfones. The base composmon is X 22.2%; C. 17.8%; A. 35.8%; G. 24.2%; G + C. 42%. The dinucleotide CpG 
is greatly under-represented (0.9%) as is common among eukaryotic sequences (Bird. 1980). 



The LTR 

The organization of a reconstructed LTR and viral flanWng 
elerr^ents are shown schematically in Figure 3. The LTR is 
638 bp long and displays usuai features (Chen and Barker. 
1984): it is bounded by an inverted repeat (5'ACTG) includ- 
ing the conserved TG dinucleotide (Temin, 1981); adjacent 
to 5' LTR is the tRNA primer binding site (PBS), com- 
plementary to tRNAif (Raba et al., 1979); adjacent to 3' 
LTR is a perfect 15 bp polypurine tract. The other three 



polypurine tracts observed between nucleotides 
8200-8800 are not followed by a sequence that is com- 
plementary to that just preceding the PBS. 

The limits of U5, R. and U3 elements were determined 
as follows. U5 is located between PBS and the po(yadeny- 
lation site established from the sequence of the 3' end of 
ol(go(dT)-prtmed LAV cONA (Alizon et al.. 1984). Thus U5 
is 84 bp long. The length of R+U5 was determined by syn- 
thesizing tRNA-primed LAV cONA. After alkaline hydroly- 



Table 1 . 


Locations and Sizes of Viral Open Reading Frames 








orf 


1* Triplet 


Met 


Stop 


No. Amino Acids 


Calc- 


gag 


312 


336 


t.836 


500 


55.841 


pol 


1.631 


1,934 


4.640 


(1.003) 


(113.629) 


orf Q 


4.554 


4.587 


5.163 


192 


22.487 


env 


5.746 


5.767 


8.350 


861 


97.376 


orf F 


8.324 


8.354 


8.972 


206 


23.316 



The nucleotide coordinates refer to the first base of the first triplet (i** tnpiet). of the first methionine (initiation) codon (Met) and of the stop codon 
(Stop). The numbers of amino acids and molecular weights are those calculated for unmodified precursor products starting at the first methionine 
through to the end. with th« exceptkjo of pof. where the size and M, refer to that of the whole orf. 
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AC 4 4 4 AC,cr,zc,:.r.2.c' 



Figure 2. Comparison oi the Genome Organization of LAV with Those 
of Human T Cell Leukemia/Lymphoma Virus Type I (HTLV-I) {Setki et 
al.. 1983), Moloney Murine Leukemia Virus (MoMuLV) (Shinnick el al., 
1981). and Rous Sarcoma Virus (BSV) (Schwartz et al.. 1983) 
The positions and sizes of viral genes are drawn to scale (open twxes) 
and the viral genomes (HNA forms) are delimited Oy brackets. 

sis of the primer, R+U5 was found to be 181 ±1 bp (Fig- 
ure 4). Thus R is 97 bp long and the cap site at its 5' end 
can be located. Finally, U3 is 456 bp long. The LAV LTR 
also contains characteristic regulatory elennents: a poly- 
adenylation signal sequence AATAAA 19 bp from the R-U5 
junction, and the sequence ATATAAG. which is very likely 
the TATA box, 22 bp 5' of the cap site. There are no long 
direct repeats within the LTR, Interestingly, the LAV LTR 
shows some similarities to that of the mouse mammary tu- 
mor virus (MMTV) (Oonehower et at.. 1981). They both use 
tRNA'^ as a primer for (-) strand synthesis, whereas all 
other exogenous mammalian retroviruses known to date 
use tRNAP™ (Chen and Barker, 1984). They possess very 
similar polypuhne tracts; that of LAV is AAAAGAAAAGG- 
GGGG while that of MMTV is AAAAAAGAAAAAAGGGGG. 
It is probable that the viral (+) strand synthesis is discon- 
tinuous since the polypurine tract flanking the U3 element 
of the aUR is found exactly duplicated in the 3' end of orl 
pol, at 4331-4346. In addition, MMTV and LAV are excep- 
tional in that the U3 element can encode an orf. In the 
case of MMTV, U3 contains the whole orf while, in LAV. U3 
contains 110 codons of the 3' half of oil 

Viral Proteins 
gag 

Near the 5' extremity of the gag orf is a **typicar initiation 
codon (Kozak, 1984) (position 336), which is not only the 
first in the gag orf, but the first from the cap site. The 
precursor protein is 500 amino acids long. The calculated 
Mr of 55,841 agrees with the 55 kd gag precursor poly- 
peptide (Luc Montagnier, unpublished results). The N- 
terminal amino acid sequence of the major core protein 
p25, obtained by microsequencing (Genetic Systems, per- 
sonal communication), matches perfectly with the trans- 
lated nucleotide sequence starting from position 732 (see 
Figure 1). This formally makes the link between the cloned 
LAV genome and the immunologically characterized LAV 
p25 protein. The protein encoded 5' of the p25 coding se- 
quence is rather hydrophilic. Its calculated M, of 14,866 is 
consistent with that of the gag protein pia The 3' part of 
the gag region probabty codes for the retroviral nucleic 
acid binding protein (NBP), Indeed, as in HTLV-I (Seiki et 



Figure 3. Schematic Representation the UVV Long Termmai Repeat 
(LTR) 

The LTR was reconstructed from the sequence of Ui9 Oy luxtaoosmg 
the sequences ad)acent to the Hind ill donmg sues. Sequencing of 
oligo<dT)'pnmed LAV DNA done pLAV75 (Attzon et al.. 1984) rules out 
the possibility of dustered Hind HI sites m the R region of LAV. LTR are 
limited by an inverted repeat sequence (IR). Both of the vtrat elements 
flanking the LTR have been represented as tRNA primer dmamg site 
(PBS) for 5' LTR and polypunne track (PU) for 3' LTR. Also indicated 
are a putative TATA box. the cap site, polyadenylation signal (AATAAA). 
and polyadenylation site (CAA). The location of the open reading frame 
F (648 nucleotides) is shown above the LTR scheme. 

al.. 1983) and RSV (Schwartz et al.. 1983). the motif Cys- 
X2-Cys-X,.,-Cys common to all NBP (Oroszlan et al.. 1984) 
is found duplicated (nucleotides 1509 and 1572 in LAV se- 
quence). Consistent with its function the putative NBP is 
extremely basic (17% Arg + Lys). 
pol 

The reverse transcriptase gene can encode a protein of up 
to 1003 amino acids (calculated = 113.629). Since the 
first methionine codon is 92 triplets from the origin oi the 
open reading frame, it is possible that the protein is trans- 
lated from a spliced messenger RNA. giving a gag-pol 
poly protein precursor. 

The pol coding region is the only one in which signifi- 
cant homology has been found with other retroviral protein 
sequences, three domains of homology being apparent. 
The first is a very short region of 17 amino acids (stanmg 
at 1856). Homologous regions are located within the pi 5 
gag^^ protease (Dittmar and Moelling. 1978) and a poly- 
peptide encoded by an open reading frame located be- 
tween gag and pol of HTLV-I (Figure 5) (Schwartz et al.. 
1983; Seiki et al.. 1983). This first domain could thus cor- 
respond to a conserved sequence in viral proteases. Its 
different locations within the three genomes may not be 
significant since retroviruses, by splicing or other mecha- 
nisms, express a gag-pol polyprotein precursor (Schwartz 
et al., 1983; Seiki et al., 1983). The second and most ex- 
tensive region of homology (starting at 2048) probably 
represents the core sequence of the reverse transcrip- 
tase. Over a region of 250 amino acids, with only minimal 
insertions or deletions, LAV shows 38% amino acid iden- 
tity with RSV, 25% with HTLV-I. and 21% with MoMuLV 
(Schinnick et at.. 1981) while HTLV-I and RSV show 38% 
identity in the same region. A third homologous region is 
situated at the 3' end of the pol reading frame and corre- 
sponds to part of the pp32 peptide of RSV that has ex- 
onuclease activity (Misra et al., 1982). Once again, there 
is greater homology with the corresponding RSV se- 
quence than with HTLV-I. 
eny 

The env open reading frame has a possible initiator 
methionine codon very ne^r the beginning (eighth triplet). 
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Figure 4. Synthesis of RNA-Primed LAV cONA tor US (Strong-Stop 
cONA) 

Lanes 1 and 2 show tvvo different quantities of cONA while lanes M and 
M' represent markers. The strong-stop cDNA is 181 bases long with a 
second, less intense band at 180. The error of estimation is ^ 1 bp. This 
maps the ma|or cap site to the second G residue of the sequence 
CTGGGTCT within the LTR. 24 nucleotides downstream of the TATA 
box. This guafwstne residue is taken as the first base m the nucleotide 
sequence shown in Figure i. 

If SO, the molecular weight of the presumed env precursor 
protein (861 amino acids. M, calc » 97;376) is consistent^ 
with the known size of the LAV glycoprotein (110 kd and 
90 kd after glycosidase treatment; Luc Montagnier, unpub- 
lished). There are 32 pc^ier^x^S N-giycosyiation sites (Asn- 
X-Ser/Thr). which are overiined in Figure 1. An interesting 
feature of env is the very high numtwr of Trp residues at 
both ends of the protein. There are three hydrophobic 
regions, characteristic of the retroviral envelope proteins 
(Seiki et al.. 1983), corresponding to a signal peptide (en- 
coded by nucleotides 5815-5850 bp), a second region 
(7315-7350 bp), and a transmembrane segment (7831- 
7896 bp). The second hydrophobic region (7315-7350 bp) 
is preceded by a stretch rich in Arg + Lys. It is possible 
that this represents a site of proteolytic cleavage, which, 
by analogy with other retroviral proteins, would give an ex- 
ternal envelope polypeptide and a membrane-associated 
protein (Seiki et ai.. 1983; Kiyokawa et al.. 1984). A striking 
feature of the LAV envelope protein sequence is that the 
region following the transmembrane segment is of un- 
usual length (150 residues). The env protein shows no 
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Figure 5. Location of a Short Srrefch of Homo*ogy m the gag-poi Re- 
gion of the LAV, HTLV-l (Seiki et al.. 1983) and flSV (Schwartz et al.. 

1983) Genomes 

Conserved ammo acids are tx)xed. Homologous region ts shown by 
the solid bar tn the schema. Each virus is organized differently m this 
region but the sequence in the RSV genome maos to plS'*^. which 
has a protease- associated funaion. 

homology to any sequence in protein data banks. The 
small amino acid motif common to the transmembrane 
proteins of all leukemogenic retroviruses (Cianciolo et al.. 

1984) is not present in LAV env. 
0 and F 

The location of orf Q is without precedent in the structure 
of retroviruses. Orf F is unique in that it is half-encoded 
by the U3 element of the LTR. Both orf have strong initiator 
codons (Kozak. 1984) near their 5' ends and can encode 
proteins of 192 amino acids (Mr calc =. 22,487) and 206 
amino acids (Mr calc 23,316). respectively. Both puta- 
tive proteins are hydrophilic (pQ 49% polar. 15.1% Arg + 
Lys: pF 46% polar, 11% Arg + Lys) and are therefore un- 
likely to be associated directly with membrane. The func- 
tion for the putative proteins pQ and pF cannot be 
predicted, as no homology was found by screening pro- 
tein sequence data banks. Between orf F and the pX pro- 
tein of HTLV-l there is no detectable homology. Further- 
more, their hydrophobicity/hydrophilicity profiles are 
completely different. It is known that retroviruses can 
transduce cellular genes— notably proto-oncogenes 
(Weinberg, 1982). We suggest that orfs Q and F represent 
exogenous genetic material and not some vestige of cellu- 
lar DNA because LAV DNA does not hybridize to the hu- 
man genome under stringent conditions (Alizon et al., 
1984). and their codon usage is comparable to that of the 
gag, pol, and env genes (data not shown). 

Relationship to Other Retroviruses 

Although LAV is both morphologically and biochemically 
(Barre-Sinoussi et al., 1983) distinct to HTLV-l and -II, it re- 
mained possible that its genome was organized in a simi- 
lar manner. The characteristic features of HTLV-l and -II 
genomes, which they share with the more distantly related 
bovine leukemia virus (BLV) (Rice et al., 1984). are not 
observed in the case of LAV. These are: a region 3' of 
the envelope gene consisting of a noncoding stretch 
(600-900 bp), followed by a coding sequence of 307-357 
codons (X open reading frame), which may slightly over- 
lap the U3 region of the LTR (Seiki et al., 1983; Rice et al.. 
1984; Sagata et al., 1984) and. second, the LTR being 
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Table 2. Comparison oi the Size of the LAV LT3 and LTfl-Related 
Element to Those of Other Retroviruses 





LTR 


U3 


R 


U5 


PU 


PBS 


IR 


LAV 


538 


456 


97 


85 


15 


LYS 


4 


HTLV-I 


759 


355 


228 


176 


1? 


PRO 


4' 


HTLV-II 


763 


314 


248 


261 


12' 


PRO 


4' 


MMTV 


t.332 


M97 


1 1 


124 


19 


LYS 


8' 


MoMuLV 


594 


449 


68 


77 


13 


PRO 


13 


RSV 


335 


234 


21 


80 


1 1 


TRP 


15 


SNV 


601 


420 


97 


80 


13 


PRO 


9 



Adapted from Chen and Barker (1984). 
I = imperfect match or tract. 

SNV = spleen necrosis virus {Shtmoionno and Temm. 1982). 



composed of unusually long U5 and R elements and the 
polyadenylation signal being situated in U3 instead of R 
(Seiki et aL. 1983; Sagata et al.. 1984; Shimotohono et al.. 
1984). We show here that, in contrast, the 3' end of the LAV 
envelope gene overlaps an open reading frame, termed F, 
that has the coding capacity for 206 amino acids and ex- 
tends within the UR (110 amino acids are encoded by the U3 
region). The putatively encoded polypeptide (pF). the pri- 
mary structure of which can be deduced, does not show 
any homology with the theoretical X gene products of the 
HTLV/BLV family. Also, the U5 and R elements are shorter 
(Table 2) and the polyadenylation signal is located within R. 
as is the case for ail retroviruses except the HTLV/BLV. Ad- 
ditionally, LAV uses tRNA'^ as (-) strand primer, as op- 
posed to tRNAP'° employed by ail other mammalian retro- 
viruses except MMTV (Donehower et al., 1961). Those 
homologies detected between the polymerase and pro- 
tease domains of LAV and HTLV are also found in several 
retroviruses, RSV in particular 

It has been reported that a cloned HTLV-I 1 1 genome 
hybridizes (T„, = 28*C) to sequences in the gag-pol and 
X regions of HTLV-I and -II; although restriction maps of 
cloned LAV and HTLV-I 11 show almost perfect agreement 
(Hahn et aL. 1984), we were unable to detect any such 
hybridization between LAV and HTLV-II (T^ = SS^C) 
(Alizon et al.. 1984). Indeed, there is a punctual region of 
homoiogy between LAV and HTLV-I (23/27' nucleotides 
starting at position 1859 in the LAV sequence) but nothing 
significant betvmn the two viruses in the X region of 
HTLV-I. One possible reason for this discrepancy is that 
HTLV-III is subtty different from LAV. However it was sub- 
sequently reported that there was very minimal, if any, ho- 
mology between orl X (of HTLV-i) and HTLV-HI (Shaw et al.. 
1984). 

Discussion 

Regulatory sequences carried by retroviral LTR are be- 
lieved to be involved in specific interactions between the 
viral genome and the host cell (Srinivasan et al.. 1984). 
The LTR sequences of LAV are unique among retrovi- 
ruses. That could reflect an original mode of gene ex- 
pression, possibiy in relation to particular transcriptional 
factors present in the virus-hartwring cell. This hypothesis 
can be tested by studying the regulatory activity of the LAV 



LTR sequences in transient or long-term experiments in- 
volving an indicator gene and different cellular contexts. 

The presence of the Q and F reading frames in addition 
to the conventional gag-pol-env set of genes is unex- 
pected. One should now address the question of their role 
in the viral cycle and pathogenicity by trying to character- 
ize their protein product(s). It is tempting to speculate on 
a roie of such polypeptide(s) in T4 ceils' monality. a prob- 
lem that can be studied by designing synthetic peptides 
for antibody production or by using site-directed mutagen- 
esis of Q and F coding regions. 

The peculiar genetic structure of LAV poses the ques- 
tion of its origin. The virus shares common tracts with other 
(apparently unrelated) retroviruses. For instance, the un- 
usually large size of the outer membrane glycoprotein 
(env) and a comparably sized genome are also observed 
in the case of lentiviruses such as Visna (Harris et al.. 
1981; Querat et al., 1984). The presence of a large part of 
the F open reading frame in the LTR. and the use of 
tRNA'^ as a primer for (-) strand synthesis, is reminis- 
cent of the mouse mammary tumor virus. On the other 
hand, homologies in the pol gene would suggest that the 
LAV is closer to RSV than to any other retroviruses. Obvi- 
ously, no clear picture can be drawn from the DNA se- 
quence analysis as far as phylogeny is concerned. Thus, 
it may well be that LAV defines a new group of retroviruses 
that have been independently evolving for a considerable 
period of time, and not simply a variant recently derived 
from a characterized viral family. Both epidemiology and 
pathogeny of AIDS should be reconsidered with this idea 
in mind, when trying to answer such questions as these: 
Are there other human or animal diseases that are as- 
sociated with similarly organized viruses? Is there a precur- 
sor to AlOS-associated virus(es) normally present, in la- 
tent form, in human populations? What triggered in this 
case the recent spreading of pathogenic derivatives? 

Exp«fim«ntal Proc*durM 

M13 Clonjng and Sequencing 

Total iJ19 ONA was sonicated, treated with the Klenow fragment of 
DNA polymerase p<u3 deoxynbonucfeotides (2 hr. 16*^^). and fraction- 
ated by agarose gel electrophoresis. Fragments of 300-600 bp were 
excised, eiectroeluted. and purified by Elutip (Schletcher and Schuil) 
chromatography. ONA was ethand-precipitated using 10 dextran 
T40 (Pharmacia) as carrier and ligated to dephosphoryfated. Sma I* 
cleaved M13mpfi RF ONA using T4 ONA and RNA ligases (16 hr. ^S'*C) 
and transfected into E. coli strain TG-I. Recombinant clones were de- 
tected tii plaque hybridization using the appropriate "P-Jabe1ed LAV 
restriction fragments as probes. Single-stranded templates were pre- 
pared from plaques exhibiting positive hybridization signals and were 
sequenced by the dideoxy chain termination procedure (Sanger et al.. 
1977) using o-"S-dATP {Amersham. 400 Ci/mmol) and buffer gradient 
ge^s (Biggen et al., 1983). Sequences were compiled and analyzed 
using the programs of Staden adapted by B. Caudron for the tnstnut 
Pasteur Computer Center (Staden, 1982). 

strong-Stop cONA 

LAV virions from infected T lymphocyte (Barr^Sinoussi et al.. i983) 
culture supernatant were pelleted through a 20% sucrose cushion and 
the cONA (-) strand was synthesized as described prevwusly (Ahzon 
et al.. 1984} except that no exogenous primer was used. After alkaline 
hydrolysis (0-3 M NaOH, 30 min. 65*C). neutralization, and phenol ex- 
traction, the cONA was ethanoHxecipitated and loaded onto a 6% 
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acrytamide/8 M urea- sequencing gel with sequence ladders as Sfze 
markers. 
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Human adult T-cell leukemia virus: Complete nucleotide sequence 
of the provirus genome integrated in leukemia cell DNA 

(human leultemia virus /provirus structure/translation frames /polyadenylylation model) 

MOTOHARU SeIKI, SeISUKE HaTTORI, YOKO HlR-WAMA, AND MiTSCaKI YOSHIDA 
Department of Viral Oncolo?\. Cancer Institute. Kamr-Ikebukuro. Toshma-ku. ToKo. Japan 



Communicated by Takeshi Sugimura, March 14. 1983 

.ABSTRACT Human retrovirus aduit T-cel! leukemia virus 
(ATLV) has been shown to be closely associated with human aduJt 
T<ell leukemia (ATL) [Yoshida, M., Miyoshi. I. & Hinuma, Y. (1962) 
Proc Sad Acad. Sci USA 79, 2031-2035]. The provinis of ATLV 
integrated in DNA of leukemia T cells from a patient with ATL 
was molecularly cloned and the complete nucleotide sequence of 
9,032 bases of the proviral genome was determined. The provirus 
DNA contains two long terminal repeats (LTRs) consisting of 755 
bases, one at each end, which are flanked by a 6-base direct re- 
peat of the cellular DNA sequence. The nucleotides in the LTR 
could be arranged into a unique secondary structure, which could 
ejtplain transcriptional termination within the 3' LTR but not in 
the 5' LTR. The nucleotide sequence of the provirus contains three 
large open reading frames, which are capable of coding for pro- 
teins of 48,000, 99,000, and 54,000 daltons. The three open frames 
are in this order from the 5' end of the viral genome and the pre- 
dicted 48,000-dalton polypeptide is a precursor of gag proteins, 
because it has an identical amino acid sequence to that of the NHi 
terminus of human T-cell leukemia virus (HTLV) p24. The open 
frames coding for 99,000- and 54,000~daitQn polypeptides are 
thought to be the pol and env genes, respectively. On the 3' side 
of these three open frames, the ATLV se<)uence has four smaller 
open frames in various phases; these frames may code for 10,000* , 
1 1,000-, 12,000-. and 27,000-dalton polypeptides. Although one or 
some of these open frames could be the transforming gene of this 
virus, in preliminary analysis, DNA of this region has no homol- 
ogy with the normal human genome. 



Recently, retroviruses were independently isolated from hu- 
man T-cell leukemias by two groups. One retrovirus is human 
T-cell leukemia virus (HTLV) isolated by Gallo and colleagues 
from patients with cutaneous T-cell lymphoma (I, 2), and the 
other is adult T<ell leukemia virus (ATLV) isolated from pa- 
tients with adult T-cell leukemia (ATL) (3, 4). Recently, these 
two viruses have been shown to be closely related (5). ATLV 
was shown to be associated with ATL, which is a unique disease 
with T-cell malignancy (6), and the provirus genome was always 
detected in the chromosomal DNA of the leukemia cells (4). 
Recently, we reported molecular cloning of provirus DNA in- 
tegrated in the cell line MT-1 and the nucleotide sequence of 
the long terminal repeat (LTR) with 754 bases (7), and we also 
proposed that ATLV might be distinct from other known animal 
retroviruses (7). From these previous observations, identifi- 
cation of genetic structure and the gene products seemed to be 
of great importance in understanding the origin of the virus and 
the mechanisms of leukemogenesis by this virus. For this pur- 
pose, we isolated a clone (AATK-1) of the provirus genome in- 
tegrated in ATL cell DNA. 

This paper reports the complete 9, 032- nucleotide sequence 
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of the proviral genome cloned in AATK-I and the amino acid 
sequence predicted for the putative proteins. 

MATERLAIS AND METHODS 
Cloning and Sequence Analysis of Provirus DNA of ATLV 
Integrated in Leukemia Cells. DNA was extracted from pe- 
ripheral blood cells of a patient (K. K.) with ATL, digested with 
£coRI. and separated by electrophoresis in agarose gel. DNA 
fractions of the l7-kilobase fragment containing the provirus 
were extracted, ligated to the £coRI site of Charon 4 A phage 
DNA, and subjected to in vitro packaging as described by Blatt- 
ner et at (81. Screening with viral ["^PlcDNA. recombinant phage 
AATK-l was isolated. The DNA fragment cloned in AATK-l 
was excised by £coRl and cleaved into several fragments with 
restriction endonucleases for subcloning in plasmid pBR322, 
The nucleotide sequence of the fragments was determined by 
the procedure of .Maxam and Gilbert (91. 

RESULTS 

Molecular Cloning and Sequence .Analysis Strategy. Pre- 
viously we reported the molecular cloning (AATM-l) of the pro- 
virus genome from cell line MT-l and identified the LTR struc- 
ture (7). However, this time we have isolated a new provirus 
clone AATK-l directly from DNA of leukemia ceUs of an .ATL 
patient for further analysis. 

A simple restriction cleavage map of the inserted fragment 
in AATK-l was constructed to subclone the regions containing 
provirus into pBR322- As shown in Fig. 1, BcmHI divided the 
viral sequence into three fragments and these were subcloned 
into pBR322; thus, pATK-03. pATK-06, and p,ATK-08 were ob- 
tained. Plasmid pATK-iOO, constructed from the Fsi I fragment 
of the AATK-l insert, contained two SamHI junctions between 
the subclones described above. The plasmids pATK-03. p.MK- 
06, and pATK-08 were digested with Pst I Sal I. and Sma I 
respectively, and the fragments were subjected to sequence 
analysis in both strands after further digestions with Hpa II. 
Sau3AI, Htnfl, or other restriction endonucleases. The deter- 
mined sequences of pATK-03, pATK-06, and pATK-08 were 
overlapped by sequence analysis across the two Bam HI sites in 
the clone pATK-100. Fig. 2 shows the 9,032- nucleotide se- 
quence of the constructed whole provirus genome with two 
LTRs, together with the cellular flanking sequences. 

DISCUSSION 

Provirus Structure. The LTR structure (U3-R-U5) is thought 
to play essential roles in integration of provirus DNA into the 
host chromosomal DNA and also in regulation of transcription 
of the provirus genome (10, 111. The provirus DNA in A.ATK- 



Abbreviations: ATL, adult T-cell leukemia; ATLV, ATL virus. HTLV. 
human T-celi leukemia virus; LTR, long terminal repeat. 
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Fig. 1. Restriction map of ATLV provirus clones. The provirus DNA 
is shown by the thick line with a LTR (box) at each end. The positions 
of the inserts from clones pATK-03. pATK-0€, pATK-08. and pATK-100 
are shown under the full provirus genome in AATK-l.*,£coRI; z,Sma 
I; P$t I; and BamHl. kbp. Kilobase pairs. 

1 contained tu-o direct repeats of the LTR sequence, one at each 
end. and the structural features were similar to those in AATM- 
1. which was isolated from cell line MT-l iT). Comparison of 
these two clones revealed the following features, (t) Sequences 
of the LTRs are identical except for 6 base changes at positions 
38. C to T; 90, G to A; 146. A to G; 209, G to A; 316. A to G; 
481, G to A: and one base (A) insertion at position 190. iii) Cel- 
lular flanking sequences are directly repeated by 6 bases in both 
clones, but the sequences themselves are different, reflecting 
different integration sites (Fig. 3). Previously, we reported 7- 
base direct repeats of cellular sequences in AATM-1. but care- 
ful reinvestigation demonstrated that there are in fact 6-base 
repeats, {in) The lengths of the viral sequences between the 
two LTRs are identical within the limits of experimental errors, 
although the nucleotide sequence of AATNM was not fully de- 
termined. The above results indicate that two clones, from cell 
line and leukemia blood cells, represent a similar ATLV ge- 
nome. 

The unique structures of the LTR previously reported (7) 
have also been confirmed in this paper. These are (i) the ex- 
tremelv long size of R (terminally redundant sequence of ge- 
nomic RNA) with 229 bases and (ii) the absence of the poly(A) 
signal around the poly(A) site, which is the end of R. With few 
exceptions, all eukaryotic mRNA containing poly(A) contained 
the poly(A) signal A-A-T-A-A-A at 10-30 bases upstream of the 
poty(A) site, but from the sequence of ATLV LTR, we spec- 
ulated in the previous paper (7) that the poly(A) signal is dis- 
pensable for polyadenylylation. However, the nucleotide se- 
quence in the LTR was found to be arranged into a possible 
secondarv' structure (Fig. 4), which may explain why transcrip- 
tion terminates within the 3' LTR but does not terminate in the 
5' LTR. In the 3' LTR, the RNA transcript that had been initi- 
ated at the 5' LTR would forni a hairpin structure, as shown in 
Fig. 4; thus, the poly(A) signal A-A-T-A-A-A, which is located 
before the "TATA"* box or at 276 bases upstream of the poly(A) 
site, is arranged into 20 bases before the poly(A) site. In this 
structure, the signal A-A-T-A-A-A might become effective in 
the RNA level. But in the 5' LTR, transcription starts from the 
cap site, which is located in the loop; therefore, the RNA tran- 
script lacks the poly(A) signal, thus allowing further transcrip- 
tion. A model for inactivation of the A-A-T-A-A-A signal by a 
possible secondary structure was also proposed in the LTR of 
murine leukemia virus by Benz et aL (12). Our model for ATLV 
suggests that signals separated by a long nucleotide sequence 
could be aligned into functional form by conformational rear- 
rangements; therefore, a definite structure in the primary se- 
quences might not necessarily be required. However, this could 
be an exceptional case. 

Capacity of the Genome To Code the Proteins. In general, 
, replication-competent retroviruses have a common gene or- 
ganization that is gag. poL and ent in this order from the 5' end 
of the eenomic RNA (13). The DNA sequence of ATLV con- 
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tained three large open reading frames and four additional smaller 
ones ( Fig. 2). Other possible open frames in the various phases 
are <200 bases, corresponding to a coding capacit>- for 70 amino 
acids. The three large reading frames probably correspond to 
gag. pol. and env because of their positions and for reasons dis- 
cussed later. 

gag gene. The first open frame, which starts from the ATG 
codon at position 302 and terminates with TAA at position 2.089. 
could code for a 48,000-dalton protein consisting of 429 ammo 
acids. The recently reported NH 2- terminal sequence of 25 amino 
acids of p24 in HTLV ; 14). which is similar to \TL\' .5'. is iden- 
tical to a part of this 48.000-dalton protein, which starts from 
proline at position 1.192. as marked in Fig. 2. The COOH ter- 
minus of p24 of HTLV is leucme (14) and this may correspond 
to the leucine at position 1.531. The predicted p24 of ATL\' has 
a molecular mass of 23.940 daitons and its amino acid com- 
position is very similar to that of p24 of HTLV* reported bv 
Oroszlan et ai (Table 1) 1 14). This finding is direct evidence that 
p24 is virus encoded and also is consistent with the fact that an 
antibody against p24 of HTL\' is crossreactive with ATLV an- 
tigens (15). Thus, the first large open frame appears to be the 
gag gene coding for a gag-precursor protein, Pr48*^*'*. To form 
p24, the Pr48*'** should be cleaved into at least three proteins— 
that is. a 14.000-dalton protein from the NH.-terminal. a 24.000- 
dalton protein from the middle, and a 9.000-dalton protein from 
the COOH terminal portions of the Pr48*'**. The molecular 
masses of the presumed polypeptides may correspond to the 
17.000-, 24.000-, and U.OOO-dalton proteins, within the limits 
of experimental errors; these proteins were found previously to 
be associated with ML\' virions (4). 

pol gene. In animal retroviruses, the pol gene is located after 
the gag gene and is translated into the gag-pol poUprotein by 
changing the reading frame after splicing of the genomic RNA 
(ref. 16) or by suppressing one termination codon, which ap- 
pears after the gag gene in the frame (171 Because \TL\' has 
the general structural features of the retrovirus genome, such 
as LTR structure and tRNA binding site \7), it is reasonable to 
expect that ATLV has the usual gene organization. Thus, the 
second reading frame from GCC at position 2,498 to TAA at 
position 5, 185 is expected to be the pol gene coding for reverse 
transcriptase. This is the largest open frame and it can code for 

Table 1. Amino acid composition of p24 



Amino acid 


p24 of ATLV 


p24 of HTLV 


Asn 


9 


{2r 


Asp 


10 


Thr 


9 


10 


Ser 


13 


14 


Gin 


21 


{36? 


Glu 


9 


Pro 


18 


22 


Gly 


11 


15 


Ala 


20 


24 


Cya 


3 




Val 


9 


7 


Met 


4 


4 


He 


8 


8 


Leu 


28 


32 


Tyr 


5 


6 


Phe 


4 


5 


His 


3 


9 


Lys 


10 


12 


Arg 


11 


11 


Trp 


4 





•Orosilan et oi. (14). 
^ Asn and Asp. 
I Gin and Glu. 
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Fig. 3. Nucleotide sequences of the virus-cellular junction m the two clones AATK-l and aATM-1. 



S96 amino acids, corresponding to a 99.000'dalton protein. This 
molecular mass is similar to that of the knoun reverse tran- 
scriptase, hut we couid not define the SH: terminus, because 
no structural information on the enzyme of ATLV or HTLV' is 
available. Because there are several terminabon codons in every- 
reading frame after the gag gene fat positions 2.089. 2.161. 
2.182. 2.239. 2.257. 2.272/2.347. 2.422. 2.455. and 2.495 in 
the frame for sag and pol (frame I\ positions 2, 123. 2. LS6. 2. 198. 
2.2S8. and 2.43S in frame 11. and positions 2.316, 2.370. 2.466. 
2.41S. and 2,448 in frame HI), splicing of the genomic RNA is 
expected to eliminate the stop codons to read through aas to 
the putative pol gene, although we have no evidence for a pos- 
sible presence of a polyprotein of gag~pol. 

env gene. The third large open frame, which starts at the 
.\TC codon at position 5.180 and terminates with the T.-\A co- 
don at position 5.644. has the capacity to code for a .54,000-dal- 
ton protein composed of 488 amino acids. This frame and the 
predicted amino acids have the following features in common 
with the ent gene products of animal retroviruses, (i) The .\TG 
codon at position 5. ISO for initiation of the .54.000-da!ton pro- 
tein is located within the putative pol gene overlapping by 5 
bases. Similar overlappings between pol and ent: are also ob- 
served in Rous sarcoma virus t D. Schwarz. R. Tizard, and W. 
Gilbert, personal communication) and murine leukemia virus 
genomes (iS). About 20 amino acids of the N'Hrtermina] 
portion are nch in hydrophobic residues, and this characteristic 
is similar to that of signal peptides proposed for the env gene 
product of Rous sarcoma virus and murine leukemia virus (18). 
lilt) The 54,000-daiton protein contains five possible sites for 
giycosylation— that is. Asn-X-Thr/Ser sequences (19) at posi- 
tions 5,597. 5.843. 5.909, 5.993, and 6,389. Because the env 
gene products are generally glycoproteins, presence of the sites 
for giycosylation is expected to be essential, although it may not 
be enough. The product of the env of ATLV or HTLV has not 
been identified, but the characteristics of the putative 54,000- 
dalton protein described above suggest that this open frame is 
the env gene rather than the one gene. 

Other genes? In addition to gag. poi and env, the ATLV se- 
quence determined has four extra open frames, as indicated in 
Fig. 2, which have capacities to code for proteins pX-I to pX- 
r\'; with molecular masses of 11,000, 10.000. 12,000. and 27.000 
daltons, respectively. Although the presence of these proteins 

i - 

C*P 111* - 



ic : r ; T.u ?-:gc K Gc * r I r 'j r; 4 A r ii jc f TGCrcuCTCTiCCTCTTTGTTTCGTTTTcrsTrc'GC 

Fig. 4. Possible secondary structure of the nucleotide sequence 
around the cap site and poly(A) site in the LTR, 



in infected or leukemia ceils remains to he studied, some of 
them mieht have functions in the process of transformation of 
infected T ceils. If some of these sequences ha\e the common 
features with the knoun one genes in acute leukemia viruses, 
similar nucleotide sequences are expected to be present in nor- 
mal human D.N'A. However, the subcloned DNA fragment con- 
taming this region did not significantly hybridize with normal 
human D.\A in Southern blottins analvsis. This preliminar>- 
result indicated that the region containing four extra open frames 
is not homologous with the human c-onc genes. Similar ex- 
periments using the other parts of viral DNA fragments sug- 
gested that ATLV has no one gene derived from the human ge- 
nome-, however, it is possible that .ATL\' mav contain a gene 
that is involved in induction of abnormal T-cell proliferation 
but not derived from the human DNA. 

Finally, it should be pointed out that the predicted viral genes 
or gene products could be tentative, because the provirus ana- 
lyzed in this paper is that integrated in leukemia cells, and we 
have no direct evidence for the replicative competence of this 
provirus, including the viral infection. 

The authors thank Dr. H. Sugano for valuable discussion and en- 
couragenient during this work. This work was supported in part bv a 
Grant-in-.-\id for Cancer Research from the .^^inis^^^• of Education. Sci- 
ence and Culture of Japan. 
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COVCR 

Color-classified Seasat synthetic aper- 
ture radar image of pack ice in the 
Beaufort Sea west of Banks Island. 
Noithwesi Territories, Canada (4 Octo- 
ber 1978). The image is a combination 
of the color-classified image and the 
original image and shows the following 
separable ice classes; red, multiyear 
ice; black, new or grease ice; yellow, 
young or pancake ice; and bluish- 
white, open water. See page 371. [W. 
F. Weeks, Snow and Ice Branch, Cold 
Regions Research and Engineering 
Laboratory, Hanover. New Hampshire 
03755] 



Structure of 3' Terminal Region of Type II Human 

T Lymphotropic Virus: Evidence for New Coding Region 

Abstract. The sequence of the 3' terminus of the human T lymphotropic virus type 
I! (HTLV-II) yvas determined and compared to the corresponding sequence of 
HTLV'L The ISSJ-nucieotide-long sequence can be divided into a 5' region that is 
not conserved between the two viruses, and a 3', lOi I-nucleotide-long region that is 
highly conserved and that corresponds precisely with a long open reading frame for 
both HTLV-! and -11. The proteins that could be encoded by these open reading 
frames have a molecular weight of about 38.000 and are closely related in primary 
amino acid sequence. The genomic structure in the 3' region of HTLV was found to 
be similar to that of bovine leukemia virus. 



host factors encoded by dominant alleles 
at the fv-1 locus {13). 

To our knowledge, these results are 
the first report of a viral capsid protein 
playing a critical role in the congenital 
transmission of a retrovirus. Whether 
capsid proteins affect the replication of 
other families of retroviruses in repro- 
ductive tissue is not known. However, 
since the ability to undergo efficient con- 
genital transmission has survival value 
for exogenous but not endogenous virus- 
es, the major capsid proteins for all exog- 
enous and endogenous viruses may have 
undergone selection for their ability to 
ensure or restrict the replication of virus 
in reproductive tissue. If so, the capsid 
proteins of exogenous and endogenous 
viruses may provide genes that can be 
used to construct viruses that either will 
or will not undergo congenital transmis- 
sion. 

Harriet L. Robinson 
Worcester Foundation for 
Experimental Biology. Shrewsbury, 
Massachusetts 01545 

Robert N. EISENMA^ 
Fred Hutchinson Cancer Research 
Center. Seattle. Washington 98104 
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The human T lymphotropic viruses 
(HTLV) are a family of retroviruses that 
are associated with T-cell abnormalities 
(/). Isolates known as HTLV-I are asso- 
ciated with an aggressive form of adult 
T-cell leukemia ^: lymphoma (/). An 
infrequent isolate known as HTLV-II 
was first identified in a patient with a T- 
cell variant of hairy cell leukemia f2). 
Recently, some viruses collectively 
called HTLV-III were isolated from pa- 
tients with the acquired immune defi- 
ciency syndrome (i). 

The genomes of HTLV-I and -II differ 
from those of the nonacute retroviruses, 
which encode only the gag, poL and env 
genes, in that they have an additional 
sequence that is approximately 1600 nu- 
cleotides long. This sequence is located 
between the 3' end of the env gene and 
the 5' end of the U3 region of the pro viral 
long terminal repeal (LTR) {4). 

Although this sequence occupies a po- 
sition similar to the src gene in Rous 
sarcoma virus, it is not homologous to 
conserved mammalian genes and there- 
fore differs from the oncogenes of trans- 
forming retroviruses {4). There is some 
evidence that this region contains a func- 
tional gene. Heteroduplex anzdysis of 
HTLV-I and -II reveals a conserved se- 
quence about 1000 nucleotides long near 
the 3' terminus of the genome (5). 
Spliced messenger RNA (mRNA) spe- 
cies that contain sequences that are 
unique to the 5' end of the viral genome 
(US LTR sequences) and a portion of the 
3' sequence are observed in HTLV-in- 
fected cells and in some fresh tumor cells 
(6). Seiki et al. (4) note that several open 
reading frames occur within the 3' se- 
quence of HTLV-I. 

To obtain a clearer understanding of 
the potential role of the 3' region of 
HTLV, we determined the primary nu- 
cleotide sequence of the region located 
between the 3' end of the env gene and 
the LTR of a cloned HTLV-II provirus, 
M015A (7), 

The nucleotide sequence of 1557 bases 
of the 3' terminal region of HTLV-II is 
presented in Fig. 1 . This sequence can be 



divided into two regions. One region, 546 
nucleotides long, is located at the 5' end 
of the sequence and has either no or very 
little similarity to the corresponding se- 
quences in HTLV-I. For this reason we 
call this sequence the nonconserved re- 
gion (NCR). A second region, 1011 nu- 
cleotides long, comprises the 3' portion 
of this sequence. This sequence is very 
similar to that of HTLV-I and is identical 
at 765 of 1011 nucleotides (76 percent 
identity). 

A new gene? The perimeters of the 
lOll nucleotide sequence of the HTLV- 
II genome correspond precisely with a 
single long open reading frame capable 
of encoding a polypeptide 337 amino 
acids long. A corresponding sequence of 
HTLV-I also encompasses a single long 
open reading frame capable of encoding 
a polypeptide 357 amino acids long. We 
call the nucleotide sequence containing 
these long open reading frames the LOR 
region (nucleotides 566 to 1557 in 
HTLV-II) (Fig. 1). 

The predicted amino acid sequences of 
both polypeptides are presented in Fig. 
1. The potential proteins encoded by the 
LOR regions of HTLV-I and -II are of 
approximately the same length and are 
identical in 259 of 337 of the amino acids 
(77 percent identity). The degree of simi- 
larity of these two proteins is even more 
striking if conservative amino acid sub- 
stitutions are considered (89 percent sim- 
ilar). The distribution of hydrophilic and 
hydrophobic regions of these proteins is 
remarkably similar (Fig. 2). 

We also note the existence of a splice 
acceptor consensus sequence located at 
the 5' end of the open reading frame (Fig. 
1). Although no methionine codon oc- 
curs at the 5' end of the open reading 
frames of HTLV-I and -II, a fusion pro- 
tein synthesized from a spliced mRNA 
can be envisioned. Several other splice 
acceptor sequences occur within this 
reading frame from which smaller fusion 
proteins might also be made. 

These observations suggest that the 3' 
terminal region of HTLV contains a new 
gene that encodes a protein with a mo- 
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lecular weight of at least 38,000. Such a 
protein could be translated from the 2.2- 
kb spliced mRNA species containing 
LOR sequences found in HTLV-infected 
ceils (6). A protein of molecular weight 
38,000 to 42,000 in HTLV-I-infected cell 
lines has been noted that is recognized 
by the serum of persons infected with 
HTLV-I, but not by serum from control 
subjects iS). 

Several other open reading frames ex- 
ist in the region between the env gene and 
the LTR of both HTLV-I and -II. Seiki et 



al. {4) have identified four such regions. 
pX I to pX IV. The pX IV region corre- 
sponds to the carboxyl terminus of the 
peptide that could be encoded by the 
LOR region. No region of predicted pro- 
tein similarity could be found in HTLV- 
II that corresponds to pX I or pX III, A 
further argument against the functional 
importance of pX I is that an 1 l-nucleo- 
tide deletion that destroys the pX I open 
reading frame occurs in an HTLV-Ic 
isolate with apparently complete biologi- 
cal activity (9). Another open reading 



frame in the LOR region of HTLV-H 
{nucleotides 530 to 1325) includes a re- 
gion exhibiting 65 percent amino acid 
homology to pX II. The significance of 
this similarity is not clear, because the 
pX II peptide is much shoner than the 
corresponding peptide in HTLV-II (87 
compared to 265 amino acids). Sequence 
similarity here could arise as a result of 
conservation of the LOR protein in the 
other open reading frame. 

We have also reponed i8) that trans- 
acting factors, either directly encoded by 
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Fig. I (left). Nucleotide sequence of the 
HTLV-II 3' terminal region and predicted 
amino acid sequence of its potential product. 
Plasmid DNA containing the 3' ponion of 
M015A, an HTLV-II proviral clone (7). was 
cleaved with either Cla I or Bgl II, which cut 
the plasmid uniquely at a single site. After 
timed digestion with Bai 31 exonuclease. the 
ends were blunted with T4 DNA polymerase 
and synthetic linkers were added prior to 
rccioning. Linker sites separated by incre- 
ments of 100 to 200 nucleotides were end- 
labeled and the fragments sequenced by the 
method of Maxam and Gilbert (/9). The se- 
quence of the 3' region following the termina- 
tion codon for the envelope gene is presented 
for HTLV-II and HTLV-L The HTLV-II se- 
quence is numbered according to the nucleo- 
tides following the envelope stop codon. As- 
terisks represent differences between the 
DNA sequences. The positions of a con- 
served splice acceptor consensus sequence 
and the 5' end of the LTR are noted. Note that 
the sequence is not well conserved 5' to the 
putative splice acceptor site but is very well 
conserved 3' to this site. The latter sequence 
corresponds to a long open reading frame (LOR) region. The predicted amino acid sequences of the potential products of the HTLV-I and HTLV- 
II 3' open reading frames are optimally aligned. Boxed regions indicate amino acid identity or conservative amino acid substitutions between the 
sequences. Fig. 2 (right). The open reading frames of the HTLV and BLV genomes. (A) The position of 3' open reading frames in the 
genomes of HTLV-I and -11 and of BLV. The 3' end of the envelope gene is shown, as well as the 5' terminus of the LTR ( t ) and the promoter 
(TATAA) sequence. The positions of the nonconservcd regions and the open reading frames (hatched boxes) arc displayed. (B) The relative 
hydrophilicity of the 3' open reading frame products of HTLV-I. HTLV-II. and BLV calculated according to the method of Hopp and Woods 
(20). Hydrophilic regions are shown above the axis, hydrophobic regions below. Dotted lines represent gaps introduced to maintain maximal 
alignment of protein sequence. 
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HTLV or induced by HTLV infection, 
substantially augment gene expression 
directed by HTLV LTR sequences. The 
phenomenon of rranj-activation distin- 
guishes HTLV from other retroviruses. 
The unusual structure of the 3' terminus 
of HTLV also distinguishes these from 
most other retroviruses. For this reason, 
we suggest that the protein encoded by 
the LOR region may mediate transcrip- 
tional changes observed in HTLV-infect- 
ed cells. In this regard, we note that 
transcription directed by the HTLV-I 
LTR is activated to high levels in a cell 
line. C8i-66. that expresses the 42.000- 
dalton HTLV-I-associated protein but 
not HTLV gag, poL or env products (5). 
We funher suggest that the HTLV LOR 
product mediates both the f/-arzj-activat- 
ing and transforming eifects of HTLV 
infection. We note that trans-aciing tran- 
scriptional activities have been associat- 
ed with the transforming genes of other 
tumor viruses, notably adenovirus and 
SV40 ilO, ID. The existence of a poten- 
tial transforming function within the 
HTLV genome may explain the ability of 
the virus to transform cells in vitro, as 
well as the absence of specific integra- 
tion sites in tumor cells and the absence 
of chronic viremia in target tissues C/2- 
14). Such a transforming function would 
differ from that of other retroviruses 
because, unlike the oncogenes, the se- 
quence that encodes the putative trans- 
forming gene will not anneal to the highly 
conserved cellular sequences [4). 

Comparison w ith the bovine leukemia 
virus genome. We noticed that the 3' 
genome of another retrovirus, bovine 
leukemia virus (BLV). also contains an 
LOR frame located 3' to the envelope 
glycoprotein gene that could encode a 
protein of a size similar to that of HTLV 
{15, 16) {Fig. 2). There is evidence for the 
existence of a subgenomic spliced 
mRNA species that contains the 3' open 
reading frame but not the gag, poL and 
env gene sequences in BLV-producing 
cell lines (/7). 

Although the similarity in structure of 
the HTLV and BLV proteins is insuffi- 
cient to indicate thai they have a com- 
mon functional role, the overall similar- 
ity in genomic structure, including the* 
location of a 5' NCR and 3' LOR frame, 
and the previously described similarity 
in protein antigenicity of the two viruses 
( / , 14) suggests that they are functionally 
similar. Moreover, there is a similarity in 
the distribution of hydrophobic and hy- 
drophilic regions of the HTLV and BLV 
polypeptides. We note that the disease 
induced by BLV has characteristics sim- 
ilar to those associated with HTLV-I, 
namely, a long latent period sometimes 



preceded by persistent lymphocytosis, 
an absence of chronic viremia in target 
organs preceding disease, and an ab- 
sence of preferred integration sites in 
tumor cells {18). These features could be 
expected of viruses that contain an LOR 
product mediating transformation. 

The biology, structure, and pathology 
of HTLV a^nd BLV differ from other 
transforming retroviruses such that we 
propose that they be considered a new 
subgroup of retroviruses distinct from 
both the nonacute transforming viruses 
that contain only the gag, poL and env 
genes and the acute transforming viruses 
that encode oncogenes. 
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gin (2). Recently, a new group of viruses. 
HTLV-III, was isolated from patients 
with acquired immune deficiency syn- 
drome (AIDS) (i). 

The envelope glycoprotein is the ma- 
jor antigen recognized by the serum of 
persons infected with HTLV {4). In this 
respect HTLV resembles several other 
retroviruses for which the envelope gly- 
coprotein is typically the most antigenic 
viral polypeptide (5), Moreover, most 



Sequence of the Envelope Glycoprotein Gene of 
Type II Human T Lymphotropic Virus 

Abstract. The sequence of the envelope glycoprotein gene of type 11 human T 
lymphotropic virus (HTLV) is presented. The predicted amino acid sequence is 
similar to that of the corresponding protein of HTLV type I, in that the proteins share 
the same amino acids at 336 of 488 residues, and 68 of the 152 differences are of a 
conservative nature. The overall structural similarity of these proteins provides an 
explanation for the antigenic cross-reactivity observed among diverse members of 
the HTLV retrovirus family by procedures that assay for the viral envelope 
glycoprotein, for example, membrane immunofluorescence. 
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ABSTRACT The nucteic acid sequence of the 3' region of 
human T-cell leukemia virus type II (HTLV-II) proviraJ DNA 
was determined using a HTLV-II proviral clone that could be 
recovered as infectious, transforming virus. The sequence data 
indicate a region of unknown function of ^1.6 kilobase pairs 
in the 3' region, analogous to the X region previously identi- 
fied in human T-cell leukemia virus type I (HTLV-I), Three 
overlapping open reading frames are present in the X region of 
HTLV-II. One of these open reading frames, Xc, is most likely 
to encode a protein product, because it has greater predicted 
amino acid sequence homology ^78%) with the X-IV region of 
HTLV-I and a greater percentage of its base differences with 
X-IV at the third nucleotide position of codons than do the 
other open reading frames. Sequences of the X-region that in- 
clude the open reading frames are conserved in two deletion 
mutants of HTLV-II, which are associated with a subline of 
Mo cells with a decreased dependence on fetal bovine serum. 



Human T-cell leukemia viruses (HTLV) are associated with 
cenain forms of human leukemias and lymphomas (1-5). At 
least two types of HTLV have been identified. HTLV type I 
(HTLV-I) is endemic to various regions of the world and is 
often associated with aggressive leukemias/lymphomas of 
mature T lymphocytes (3-5). HTLV type II (HTLV-II) was 
found in a single patient (Mo) with a T-cell variant of hairy- 
cell leukemia (1, 2. 6). This patient is alive and well 8 yr after 
splenectomy. 

Both HTLV-I and HTLV-II transform normal human pe- 
ripheral blood or cord blood T lymphocytes in vitro (7-10). 
These virus-transformed T cells have a helper-inducer phe- 
notype similar to that of leukemic cells in patients with 
HTLV-associated disease. Elucidation of the mechanism of 
in vitro transformation is relevant to the process of leukemo- 
genesis. However, the regions of the HTLV genome neces- 
sary for transformation have not been identified. Nucleic 
acid sequence analysis of the complete HTLV-I genome re- 
vealed a region at the 3' locus of the genome with no known 
function and without precedent in animal retroviruses other 
than bovine leukemia virus (11). This region, referred to as 
X. is suspected to encode protein(s) involved in the process 
of transformation. The X region does not cross-hybridize 
with normal human cellular DNA sequences and, therefore, 
does not encode a retroviral oncogene. 

HTLV-II has in vitro biological properties similar to but 
only limited homology with HTLV-I as determined by hy- 
bridization of the genomes and nucleic acid sequencing of 
the long terminal repeat (LTR) (12, 13), By nucleic acid se- 
quence analysis we have identified a region comparable to X 
in an infectious and transformation-competent molecular 



The publication costs of this anicle were defrayed in pan by page charge 
payment. This article must therefore be hereby marked "advertisemenr 
in accordance with 18 U.S.C. §1734 solely to indicate this fact. 



clone of HTLV-H. The homology between the two viruses in 
this region was determined, 

MATERIALS AND METHODS 

Sequencing of HTLV-II DNA. Bacterial plasmid pH6-B3.5. 
which contains env, X, and a part of the LTR of HTLV-II. 
was used as a source of DNA. pH6-B3.5 was subcloned from 
a cloned infectious HTLV-II provirus. XH6. The sequencing 
method of Maxam and Gilben was applied to 5'- or 3'-end- 
labeied DNA fragments obtained by digestion of the DNA 
with restriction enzymes (14). Both strands of the DNA were 
sequenced. 

Comparison of Nucleotide and Amino .Acid Sequences. Nu- 
cleotide or amino acid sequence homology was assessed us- 
ing a computer program developed by Japan Soft Develop- 
ment, 

Transfection of HTLV-H Proviral DNA. The procedures 
used for transfection of lymphoid cells were described previ- 
ously (15). 

\faterials. Restriction enzymes were purchased from Ta- 
kara Shuzo (Kyoto, Japan), Bethesda Research Labora- 
tories, or New England Biolabs. Polynucleotide 5'-hydroxyl- 
kinase and the large fragment of DNA polymerase I were 
from Boehringer Mannheim and Takara Shuzo. respectively. 
Radiolabeled nucleotides were from Amersham. 

RESULTS 

Nucleic Acid Sequence Analysis of the 3' Region of the 
HTLV-II Genome. Previous nucleic acid sequence analysis 
of HTLV-I revealed four potential open reading frames be- 
ginning with methionine codons in the 3' region of env (11). 
However, as the sequenced provirus was not recovered as 
an infectious virus it may not represent the genome of a rep- 
lication- and transformation-competent HTLV-I. Therefore, 
we first determined whether an apparently complete HTLV- 
II provirus clone. A.H6, could be recovered as infectious vi- 
rus capable of transforming normal human T lymphocytes. 

Since HTLV-II can replicate in some B-lymphoblastoid 
cell lines, we used a B-cell line for HTLV-II DN.A transfec- 
tion. The HTLV-II provirus was subcloned into the plasmid 
vector pSV2-neo, Protoplasts of E. coli HBlOl containing 
the HTLV-II subclone, pH6-neo (Fig. lA), were fused with 
WIL-2 cells and antibiotic G418-resistant clones of cells 
were subsequently selected. Of these G418-resistant B-cell 
clones, ==25% expressed viral pl9 and p24 antigens, as deter- 
mined by indirect immunofluorescence (Fig. IB), and viral 
RNA which was correctly initiated from the cap site of the 
LTR (15). These B cells were lethally irradiated and cocuiti- 
vated with normal human peripheral blood lymphocytes as 
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Fig. 1. Transfcction and expression of HTLV-H proviral DNA 
in human B cells, (A) The plasmid pH6-nco used for transfection is 
shown schematically. The complete HTLV-II provinis of XH6 (12) 
between flanking HindUl sites in cellular DNA was subcloned into 
the EcoRl site of the plasmid vector pSV2-neo, The DNAs cleaved 
with Hindlli or £coRI were treated with nuclease Si to blunt the 
ends and were ligated. WIL-2 cells were transfectcd with pH6-neo 
by spheroplast fusion (15) and stable transformants were propagat- 
ed. The thick line and open boxes represent the provinis. Zig-zag 
lines show cellular flanking sequences. The thin lines and shaded 
box represent pSV-neo sequences. Bgt II (Bg)» Hindlli (H), EcoRl 
(RI), and Pvu II (Pv) sites are indicated. (B) One of the stable B-cell 
transformants that express HTLV-II p24 antigens is shown. Fixed 
cells were treated with rabbit antisera directed against viral p24 anti- 
gens and visualized by indirect immunofluorescence. 

described to test for HTLV-II transformation (10). Cell lines 
with the T-heiper surface-antigen phenotype were estab- 
lished and shown to be infected with HTLV-II. No B-cell 
markers (surface membrane immunoglobulin, Epstein-Barr- 
virus nuclear or capsid antigen) were detected in the trans- 
formed peripheral blood cells. The nucleic acid sequence of 
the \H6 provirus therefore represents the genome of a repli- 
cation- and transformation-competent HTLV-II. 
The nucleotide sequence of the 3' region of HTLV-II is 
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shown in Fig. 2; env ends one base before the first nucleotide 
of Fig. 2 and the X region of HTLV-II extends to position 
1559. 

Sequence Homology Between the X Regions of HTLV-II and 
HTLV-I. There is considerable nucleotide sequence homolo- 
gy between the X regions of HTLV-II and HTLV-I. Most of 
this sequence homology is in the 3' two-thirds of the X re- 
gion that is coincident with the position of the major open 
reading frames in the X region (see below). In this region 
there is 15% nucleic acid sequence homology. In contrast 
there is only 33% sequence homology in the 5' one-third of 
the X region. The 5' region corresponds to the X-I region of 
HTLV-I (11). No long open reading frames are present in 
this region of HTLV-II. 

Three open reading frames with overlapping sequences 
were identified in the nucleotide sequence of HTLV-II. 
These open reading frames Xa, Xb. and Xc correspond to 
the open reading frames X-II. X-IIL and X-IV of HTLV-I 
(11). The derived amino acid sequence homologies for the 
codons between flanking termination codons are 62, 61. and 
1S% for Xa/X-II, Xb/X-IIt, and Xc/X-IV. respectively. Xc 
and X-IV share a stretch of 335 corresponding amino acid 
codons uninterrupted by termination codons (Fig, 3). com- 
pared with 96 for Xa/X-II and 145 for Xb/X-III. Funher- 
more, there is 82% amino acid homology in the 112 codons 
upstream of the first conserved methionine codon of Xc/X- 
IV. indicating that the Xc region may encode a fused protein 
whose initiation codon is in a different region of the genome. 

Frequency of Base Changes at the Third Nucleotide Position 
in the Open Reading Frames. The frequency of base changes 
between divergent sequences in each of the three positions 
in amino acid codons has been used as a measure of evolu- 
tionary conservation and, therefore, functional significance 
of an open reading frame: the greater the frequency of third 
nucleotide changes relative to the first and second, the more 
likely that the reading frame encodes a protein that has been 
conserved during evolution. 

The frequency of base changes in the open reading frames 
of HTLV was calculated by comparing the sequences of the 
corresponding open reading frames in HTLV-I and HTLV- 
II. The number of mismatched bases between the Xa region 
(from nucleotides 530 to 817) in HTLV-II and the corre- 
sponding region in the open reading frame for X-II in HTLV- 

I is 58 mismatched nucleotides out of 288 nucleotides (only 
amino acid codons between termination codons are included 
in this calculation). Fourteen percent of these mismatches 
are at the third position of codons in this reading frame. In 
the Xb/X-III region^ 28% of the mismatched bases are at the 
third position. However, for Xc/X-IV, 66% of the mis- 
matched bases are located at the third position, a significant- 
ly greater frequency than that of either of the other two read- 
ing frames. Thus, Xc is most likely to encode a protein prod- 
uct. 

Conservation of the 3' Region of HTLV-II DNA in Deletion 
CVfutants. Molecular cloning and charactetjzaxion.of HTLV- " 

II DNA from Mo cells 'dfertionstrated the presence of three 
forms of HTLV-II proviruses in these cells. The largest 
cloned provirus represents the complete replication-compe- 
tent genome of HTLV-II as evidenced by the recovery of 
infectious, transforming HTLV-II by DNA transfection. The 
other two forms of HTLV-II DNA were defective, having 
large internal deletions of the viral genome. However, their 
LTRs were intact and both defective genomes could be 
packaged as infectious virus (12). The defective viruses are 
associated only with a subline of the HTLV-II-infected Mo 
cells that has growth propenies distinct from the original Mo 
cell line: these cells have a decreased dependence on fetaJ 
bovine serum and clone spontaneously in methylcellulose 
and by limiting dilution. 

Restriction enzyme analysis showed that the larger defec- 
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illLV-n : - - - 

HTLV - 1 : aCCAaGCaCCCAATTaTTCCAACCaCaTCGCCTCCAGCCTCCCCTCCCAATAaTTAACCTCTCCCaTCAAATCCTCCTTCTCCTG =5 

iiTLV-I I : ACCTCCTACCTTCTCCACCAAATCCCCTAGGTTCGTCCCCCTACCATTCACCCATCCACAGTCCTCTATACCAGATCACTCGCCCCCCAT 90 

HTLv-i : caccaacttcctccgttcagcctccaagcactccacctccccttccaactctctactatac,ccatcaatccccaactcctgcattttttc 1:5 

HTLV- i I : CTCCAGCCCTAACTCGATTCTGAATAATTGCCTCAAATaCTTCCTCTAACCCCCCCTCACaTTCCTCCCATaGGaCCTTCTTTTCCCCTT 130 

[(TLV-I : TTTCCTAGCACTATCCTGTTTCCCCTTCTCAGCCCCTTGTCTCCACTTGCGCTCaCGGCGCTCCTCCTCTTCCTGCTTCCTCCTAGCG.AC 26 5 

HTLV- 1 1 : caggaaatccacataaccctcaaccaagtcacaaaacccatcaaaacccaggagtcctatacactccaactgctgatgcctttcttccct : ro 

HTLV-i : GTCaGCCGCCTTCTTCTCCCCCCGCCTCCTGCGCCGTCCCTTCTCCTCTTCCTTCCTTTTCAAATACTCAGCCCTCTGCTTTTCCTCCTC 15 ; 

:-;TLV- 1 1 : CTCCCCGCGCTTTTGaTCCTTTTCCCGCACCGCCTCCTTTCTCCGCCGCTCCCCCTCCTGaCGCTCCTGCACAACTTTTAaGATCTCCCC 360 

fiTLV- i : TTTCTCCCCCTGTTTTTTTCCCTTCCTCTTCTCCTCaGCCCGTCGCTCCCGaTCaCGATGCGTTTCCCCGCGaCGTGGCGCTTTCTCCCC ii5 

HTLV- r I : CTCCTCCTCCCCCAaCaCTCTCCGACGAGAGTCTCGCACCTGCTCCCTCaCCCaTCGCGaCCCCaGaGCGCGaCCTTTTGCTCTCCTTCT i50 

« (la >«S!i!i«a«« a» a c .lo^t aa aaa 

HTLV-I : TCCAGCGCCCCGTCGCAGCCCGCCGCGCCTTTCCTCTTCTAACGATAGCaaaCCGTCAACCACaCCTTCCTCCTCCTCCTTGTCCTTTaa 5 j5 

X-r J Xa 

HTLV-I I : CGGTTCCTCTCCaCCGCCaGGCaCACCaCaTCTCaGaCTCGCCTCTCCCTCGTCTCCTAACGCCAATCTCCTaaaaTACTCTAAAAAATC 5i.O 

aaaasas i aaaan «aa ««t« * m m * a a a a 

HTLV- 1 : ctcttcctccaagcataataccccctccaccaattcctccaccaccagctcctccgcccatgacacaggcaagcatcgaaacacccctcc 6^3 

pXc X j Xb ^ x-lll 

HTLV- li : aCaCaTAAiTaCAaTCCTCTCTCCTCTCaGCCCaTTTCCTAGCaTTTGGaCaGaGCCTCCTATATGCATACCCCGTCTaCGTGTTTGGC: 6 30 

a « « «M i!t« :i mm m m a««««a«a mm mm mmm mmmmmmmmmm mm mm m ananaaaa mmmmmm^m»^ai*nti 

HTLV-I : agatacaaagttaaccatgcttattatcagcccacttcccagcctttcgacagactcttcttttcggatacccagtctacctgtttggac 7 1 5 
' — x-iv 

HTLV-I I : aTTGTGTaCaGGCCGaTTGGTGTCCCGTCTCaCCTCCTCTATCTTCCaCCCGCCTACaTCGaCATGCCCTCCTGGCCaCCTCTCCACaGC '20 

HTLV-I : aCTGTGTaCAAGGCGACTCGTGCCCCATCTCTGCGCCaCTATGTTCCGCCCCCCTACaTCCTCACCCCCTaCTCCCCaCCTGTCCaGaGC 30 3 

HTLV-I r : aCCAACTCaCCTGGGaCCCCATCGaTGGACGCCTTGTCAGCTCTCCTCTCCAATaCCTTaTCCCTCCCCTCCCCTCCTTCCCCaCCCaGa 810 

HTLV-I : aTCaGATCaCCTCGGaCCCCaTCGaTGGaCCCGTTaTCGGCTCaGCTCTaCACTTCCTTATCCCTCGACTCCCCTCCTTCCCCACCCAGA 39 5 

:ITLV-I I : GaaCCTCAAGGACCCTCAAGGTCCTTaCCCCTCCCACCaCTCCTGTCTCCCCCAACCTTCCaCCTCCCTTCTTTCAATCAATCCCAAaCC 900 

mmmmnt a ■« « a a a a o n « « a a « « a « « « a n « mm m aaaaa m aaeaaaa aaa«aa« «««a« « aa a aaaaa ^a 

HTLV- I : CAACCTCTAAGaCCCTCaaCGTCCTTACCCCCCCAATCACTCATACAACCCCCAACaTTCCaCCCTCCTTCCTCCaCCCCaTGCGCaaaT 93 5 

X-I; 

HTLV-I I : aCaCCCCCTaCCGAAATCCaTCCCTGGAACCAACCCTCCCGCATCaCCTCLXCTCCCTCGCCTTCCCCCAACCTGGCCTCCCTCCCCAAA 990 

HTLV-I : aCTCCCCCTTCCGAAATGGaTACATCGAACCCaCCCTTGGCCaGCACCTCCCAACCCTCTCTTTTCCACaCCCCCCaCTCCCCCCCCaaa 107 5 



Xb- 



HTLv-ii : acatctacaccacctcgggaaaaaccgtagtatgcctatacctataccagctttccccacccatcacatgcccacttataccccatgtca lOSO 

■ a a * » m m » m aaaaa.-ia aaaa mm aaa a aaaaa aaaaeatsaaaaaaa aaaaa aa aaaaa a« a aawaa rja 9 

HTLV-I : aCCTGTaCaCCCTCTCCGGAGCCTCCGTTCTCTCCATGTACCTCTaCCACCTTTCCCCCCCCaTCACCTCGCCCCTCCTCCCCCaCCTGa 1165 

x-iiH 

HTLV-II : tattctgccaccccagacaattagcacccttcctcaccaaggtgcctctaaaaccattacaacaacttctatacaaaatgttcctacaca I 1 70 

a «a r) a n a a o a o a a aa a aa a a a n a a a a a a a *» a aa aa rr a aaa aaaaaaaaaa aa a» aaaaa a aoM « »» 

iiTLV-I : TTTTTTGCCACCCCCGGCaCCTCCGCGCCTTCCTCACCAATGTTCCCTACAAGCCAATACAAGAACTCCTCTATAAAATTTCCCTCaCCA 1255 
HTLV-I I : CaCGCaGaCTCaTACTCCTCCCGCAGGACCaCCTACCCACCaCAATGTTCCAACCCCTCACCCCTCCCTGTATCCaCaCTCCCTCCTGTa I 260 

mmm^n a a aaa a aa aa «a aaa a a a a t» a a a a ^ a n a e a :a d it aaaaa *t n m a n aa naaaa^ 

HTLV-I : CaGCCCCCCTAATaaTTCTACCCGAACACTCTTTGCCCACCaCCCTTTTCCAGCCTGCTACGGCACCCCTCACGCTAACaGCCTGGCaaa 13-5 

Xa , 

HTLV-I I : CaGGaCTTCTCCCCTaTCaCTCCaTCTTAACAACCCCAGCTCTAATATCGACCTTCAaTCACCCCTCACCAATCaTTTCCGGCCCTTaC: 1350 

aa aa "a aa a aaaat a a a aa aa m^mv a aa oa aaaca aa a aa eao « aa i^mmnnm^mmmm mm r. f. 

HTLV- 1 : aCGGCCTCCTTCCCTTCCaCTCAACCCTCaCCaCTCCAGGCCTTATTTGCaCATTTaCCGATCCCaCGCCTATGaTTTCCCCGCCCTGCC U35 
HTLV-I I : CCAAAGCACGGCaGCCATCTTTACTACTTCaGTCCTCCCTATTAATCTTCCAAAAATTCCAAACCAAACCCTTCCATCCCTCCTATCTaC ii-O 

a aaaa «a aaaaaaaaaaaai^aa a aaaAaa»:ia aa aa a aaa --a a aiiaaaaa aaaa aaa aaaas a a^s^J^rt 

HTLV-I : CTAAAGATGGCCaGCCATCTTTACTaCTACACTCCTCCTCCTTTATaTTTCACAAATTTCAAACCAAGGCCTaCCACCCCTCATTTCTaC 132 5 
ilTL'V-n : TCTCTCATCACCTTATACAATaCTCCTCCTTCCATAACCTTCaCCTTCTATTCGATGaaTACACCAACATCCCTGTCTCTATTTTATTTA 1 530 

aaaa aa aa m mmm m auaaa aaaaa aaaa ^ aa aa isa aa ammmi*ma'mr»ffv*maitm * aaa a a 

ilTLV- I : TCTCACaCCGCCTCATaCACTACTCTTCCTTTCaTaCTTTACATCTCCTCTTTCAAGAATACaCCAACATCCCCATTTCTCTaCTTTTTA 1015 

Xc , 

IITLV-II: ATAAAGAAGACCCGGATCACAATCGCCAC- - - ^^^'^ 

a a? nmvmmav m«9W « a 

llTLV-i : aCGAAAAAGaGCCAGaTGACAATCACCATGACCCCCAAATaTCCCCCGCGC;CCTTAGACCCTCCCaGTCaaaaaCATTTCCCac:aa.\CAC I 70 5 



IITLV-I I ; 



HTLV- 1 : AAGTCj 17 10 



X-IV 



Fic. 2. Nucleotide sequence homology between the X regions of HTLV- 1 1 and HTLV-I proviruses. The nucleotide next to the 3' end of the 
env gene is designated nucleotide 1. The open reading frames Xa, Xb. and Xc in HTLV-lI and X-L X-II. X-IIL and X-IV in HTLV-I are shown. 
The 5' portion of the X-I open reading frame is in env. Xc and X-IV end in the LTRs. A putative splice acceptor site is indicated by an arrow. 
Asterisks indicate nucleotides that are identical in the two sequences. 

tive genome, typified by clone H9 (Fig. 4), has a deletion of ly the entire internal sequence of HTLV-II. Excluding the 
=2.0 kilobase pairs (kbp) with conservation of =5.0 kbp in LTR, <2.0 kbp of the sequence is conserved. Detailed re- 
the 5' region and =2.0 kbp in the 3' region. The smaller de- striciion enzyme analysis demonstrated that most of the con- 
fective provirus. typified by clone H2, has adeleiion of near- served sequence in the 3' region is the X region of HTLV-II. 
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HTLV-IE : LQSCLLSAHFLG FGQSLLYGY? VYVFGDCVQA DWCPVSCGLC STRLHRHALL aTCPEHQLTV 62 

m^m-mmmm « «««««« ««« ««««««««« «««« « ««««««« 

HTLV-I : --PCLLSAHFPG FGQSLLFGYP VYVFGDCVQC DWCPISCCLC 5ARLHRHALL ATCPEHQITW 50 

HTLV-II : DPIDCRVVSS PLQYLIPRL? SFPTQRTSRT LKVLTPPTTP VSPKVPPaFF QSMRSCHTPYR 122 

HTLV-r : DPIDCRVIGS ALQFLIPRLP SFPTQRTSSCT LKVLTPPITH TTPNIPPSFL QAMRKYSPFR 120 

HTLV-H : NGCLEPTLCD QLPSLaFPE? GLRPQNIYTT WCKTVVCLYL YQLSPPMTWP LIPHVtFCHP 132 

mm mmmmm mm * mm * m» mm mmm mm mmmnmm ««« * 

HTLV-I : NCYMEPTLCQ HLPTLSFPDP CLRPQMLYTL WCGSVVCMYL YQL3PPITWP LLPHVIFCH? 130 

HTLV-n : RQLGAFLTKV PLiCRLEELLY KMFLHTCTVt VLPEDDLPTT MFQPVRAPCI QTAWCTGLLP 242 

mmmmmm* m m mm m***» * « mmmm mmm* mmm mmm « « «««« 

HTLV-I : GQLCAFLTNV PYKRIEELLY /CtSLTTCALl [LPEDCLPTT LFQPARAPVT LTAWQNCLL? 240 

HTLV-II : YHSILTTPGL tVTFNDGSPH ISCPYPKACQ PSLVVQSSLL IFEKFETKaF HPSYLLSHQL 302 

** mmmmmm mmmm mm mm mmmm « » m» mmmm mmm mm mm mmm mmm mmmm ■ 

HTLV-I : FHSTLTTPGL IVTFTDGTPM ISGPCPiCDCQ PSLVLQSSSF IFHKFQTKAY HPSFLLSHGL 300 

HTLV-II : IQYSSFHNLH LLFDEYTNIP VSILFNKEEA DDNCO 337 

mm*^*** mmm mmmmmm « mmm mm mmm 

HTLV-I : tQYSSFHSLH LLFEEYTNIP [SLLFNEKEA ODNOHEPQIS PGGLEPPSEK HFRETEV 3S? 



Fig. 3, Homology of predicted amino acid sequences encoded by the open reading frames Xc (HTLV-II) and X-IV f HTLV-I). Asterisk 
indicate identical amino acids in the two sequences. Amino acids are represented by standard one-letter abbreviations (16). 



Furthermore, the deletion endpoints occur at the 5' end 
of the X region, upstream of the large open reading frames 
(Fig. 4). 

DISCUSSION 

We have sequenced a 3' region of the HTLV-II genome of 
^1.6 kbp with an unknown function. HTLV-II resembles 
HTLV-I in a number of its properties, including biological 
functions, such as lymphoid target-cell specificity and T-cell 
transformation (1, 2, 6-10), and conservation of important 
structural features within the LTR (13). The X region repre- 
sents another common structural feature that is present in 
the genomes of both HTLV types. The X region of HTLV-II 
has 61% sequence homology with the X region of HTLV-I, 
and shows homology as great as 75% in the region encom- 
passing the 3' two-thirds of the X region. The sequence con- 
servation in this part of the X region in both types of HTLV 
strongly suggests that the X region serves an important func- 
tion in virus replication and/or transformation. Three large 
open reading frames with overlapping sequences are present 
in the HTLV-II X region. If they began with initiation co- 
dons, these open reading frames would be sufficient to en- 
code proteins of 15,100, 23.700, and 24,500 daltons. Al- 
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though the predicted amino acid sequence of HTLV-II in 
these regions shows =60% homology with those of the X-II 
and X-III regions and 75% homology with that of the X-IV 
region of HTLV-I. the positions of initiation and termination 
codons would result in proteins of different predicted sizes. 

Comparison of the LTR sequences of HTLV-II and 
HTLV-I indicates that these two viruses are only distantly 
related. It is likely that the two viruses evolved from a single 
ancestral virus and have retained common sequences that 
are important for replication (13). Of the corresponding open 
reading frames in the two viral genomes, Xc and X-IV share 
the greatest sequence homology, the longest stretch of con- 
tiguous codons uninterrupted by termination codons. and 
the greatest frequency of third-position differences relative 
to first- and second-position differences. Therefore, it i 
likely that Xc in HTLV-II and X-IV in HTLV-I encode func- 
tional proteins. 

The high predicted amino acid homology and relatively 
high frequency of third-position differences holds true for 
112 codons in Xc/X-IV located upstream of the first methio- 
nine codon. Therefore, it is probable that Xc is translated as 
a fused protein from a spliced mRNA. In this regard, it is 
interesting to note that a potential splice acceptor site is lo- 
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II^- Location of deletion endpoints in the HTLV-II H.2 and H-9 clones. The restriction enzyme map of the HTLV-H X region and 3' 
LTR (represented by a box) is illustrated. Restriction enzyme sites are shown for Ace 1 iA), Ava I (AV), BamHl {B).Bglll fBG). Cla I (C). Pst 1 
(P). and Xho I (X). Boundaries between env and X and between X and the LTR are indicated. The 3' deletion endpoints of the defective H-2 and 
H.9 clones, as determined by restriction enzyme mapping and subsequent hybridization analysis (unpublished data) are shown in reference to 
the XH-6 restriction enzyme map. The location of Xc is denoted by a bracketed line, bp. Base pairs. 
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cated near the 5' end of the Xc/X-IV region at nucleotide 
position 570. 

Since animal retroviruses for which nucleic acid sequence 
information is available do not have sequences comparable 
10 the X region of HTLV. it is likely that this X region has a 
unique function in viral replication and cellular transforma- 
tion. The retention of the open reading frames in the X re- 
gion of the deletion mutants of HTLV-II may be relevant to 
that region's potential function in transformation, particular- 
ly since the deletion mutants are present only in a subline of 
the Mo cells having much less stringent growth requirements 
than the parental Mo cells. Identification of the proteins en- 
coded by the X region and X-region-specific mRNAs will be 
necessary to determine the significance of the X region. 

Nocc Added in Proof. While this work was in press. Haseltine et al. 
(17) published a sequence of the 3' region of HTLV-ll that differs 
from thai presented here at six nucleotide positions in the Xc region: 
four of these differences result in amino acid changes. These differ- 
ences from our data may be due to sequence differences in the two 
provirus clones used for analysis; the significance, if any. of these 
differences must await demonstration of the inlectivity of the cloned 
HTLV-Il provirus used for sequencing (13). 
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